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To 

EXPERIMENTALISTS 

Though the mists of life 
obscure the distant scene 
they are undismayed for, 
in markings near about, 
they discern the contour 
of the land and the portal 
to the future* 




PREFACE 

An alternative title for this book which would 
emphasize the point of view is Statistics , Its 
Philosophy and Method . This work is more than a 
revision of the writer’s earlier Statistical 
Method (1923). That work covered a field which 
in the last twenty years has grown to such magni- 
tude that nothing short of an encyclopedia could 
cover it today. The endeavor herein has been to 
place a great emphasis upon the logic and prin- 
ciples underlying the statistical study of phe- 
nomena, to provide, in the early chapters, such 
basic issues as will integrate thoughtful and 
investigative moods with statistical processes, 
and, in the later chapters, to give such treat- 
ment of modern processes as is required in hand- 
ling many experimental situations and as will 
open to the reader the wealth of thought in cur- 
rent statistical literature. 

The early chapters constitute an elementary 
text, and call for no mathematical background 
other than that of arithmetic and elementary 
algebra. They seek, first, to show the place of 
statistics in the social process of sol ving prob- 
lems and, second, to provide the most basic 
tools. The approach to statistics of most be- 
ginning students is~ that it holds* s'ome useful 
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tri-cks which, with a certain drudgery, can be 
learned. The content of statistics is not a bag 
of tricks, but rational thought processes which 
serve in problem situations. That there is a 
nicety in these processes gives zest to their 
discovery and pleasure in their use. This view- 
point can scarcely be overemphasized, and it is 
hoped that teachers of this text will further 
emphasize and elaborate upon the utility of sta- 
tistical concepts in the everyday problems of the 
physical and biological scientist, the schoolman, 
the doctor, the farmer, the economist, and the 
businessman. 

The parallel work of the author, Statistical 
Tables , is designed specifically for laboratory 
statisticians and is not a requisite to students 
of this text in view of the abridged set of 
tables given herein. 

In addition to the numerous acknowledgments 
cited throughout this text, the writer is deeply 
indebted to Eric F. Gardner for statistical re- 
search in connection with sequential analysis and 
otherwise and he is also greatly indebted to him 
and to Katherine Ytredal for expert assistance in 
the difficult task of preparation of tables and 
of the manuscript for the printer. 

Truman Lee Kelley 

Cambridge, Massachusetts 
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FUNDAMENTALS OF STATISTICS 




CHAPTER 1 

THE DIGNITY OP DATA AND THE BACKGROUND 
OP STATISTICS 

SECTION 1 

STATISTICS, PSYCHOLOGY, AND LOGIC 

An isolated fact is an unthinkable phenomenon. 
Adults experience them and think of them after 
the occurrence, but not before. Furthur experi- 
ences of the same sort (the impossibility of such 
in an absolute sense is admitted, for a second 
experience cannot be exactly the same as a first) 
are no longer isolated. Infants presumably ex- 
perience them in great number, making each new 
day richer than the past in ability to interpret. 
These new infringements upon consciousness are a 
part of life which lies outside the field of 
statistics. Note the appropriateness of the 
plural, and that "statistic” is capable of defi- 
nition only after the plural has been defined. 
As a second experience of something, say a cat 
caterwauling, is different from the first, so 
the third is different from the second, etc. , 
ad infinitum. There is no identity in successive 
experiences. In some sense each is isolated 
from all others, but in another and a psycho- 
logical sense it is not, and it is this latter 
fact that provides the basis for statistics. 
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Let us attempt to trace the delimitation of 
attention of a person in an investigative mood. 
We, of course, cannot catch him at the beginning 
of any activity. At the time we first see him 
he is already possessed of many ideas, experi- 
ences, remembrances, interests, and attitudes, 
dating from the more or less remote past, and he 
is in a particular environment providing par- 
ticular problems and data. The nature of the 
investigation that he undertakes is limited by 
these things. If on a country road, and if in 
his background there is the training, the inter- 
est, and the specific information of a botanist, 
he may narrow his attention to the botanical 
features around, whereas if a geologist he may 
turn his attention to rocks and soils. This 
narrowing of the field of interest is necessary 
to contemplative and creative thinking. In sta- 
tistical language it is the first step in the 
making of a statistical series. The items of 
such a series do not encompass all that exists, 
or even all that is knowable, about the things 
observed, but only so much about each as is ap- 
parent when the attention is limited to a speci- 
fied narrow field. 

A statistical series is a succession of ob- 
served values referring to a single character of 
a number of objects, things, or cases, which have 
been selected according to some single principle. 
In the situation wherein these cases are randomly 
drawn members of a much larger, generally infi- 
nite, parent population, the series is also a 
sample . * 

The establishment of the single principle of 
selection is the first step in methodical in- 
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vestigation . One cannot and should not try* to 
investigate everything at once. To attempt it is 
scatterbrained and unproductive. The original 
limits of interest of the botanist on the country 
road might be to living things and thence to 
vegetation. A further limitation might be to 
trees. A still further one to maple trees, and 
thence to one characteristic of them, say leaves, 
and thence to, say, the veining of leaves. If, 
after following this greater and greater delimi- 
tation to a certain point, the investigator stops 
and then observes that there remains a variable 
phenomenon, say the number of veins, of a number 
of things, leaves, all falling within the limits 
of selection as he has defined them, he then has 
a statistical series, which in this illustration 
is also a sample, for the leaves noted may be 
thought of as a finite random selection from an 
infinite population of maple leaves. 

Had he stopped earlier in the process of de- 
limitation, say at the point when his interest 
was merely limited to living things, would he 
then have had a statistical series? The items 
might be the height of a pine tree, the bark of 
a dog, the speed of a pony, etc. Though these 
be under a single principle, — living things as 
encountered along the highway, — it is not enough, 
for there is also needed a single principle of 
observation. Had he observed the height of a 
pine tree, the height of a dog, of a pony, etc. , 
we would call it a statistical series and also a 
sample, intractable though it may be in revealing 
any interesting or unifying fact of life. 

If we attempt to define statistical series 
and samples more narrowly than stated we run into 
logical di f f iculties. If we require that the 
elements of a sample shall be in truth observa- 
tions of a single sort, made upon things of a 
single sort selected in some single manner, we 
have narrowed it to the point of impossibility 
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of exact fulfillment. What are phenomena of a 
single sort and what is selection of a single 
sort? 

Let ns consider a sample coming about as 
closely as we can conceive to meeting these con- 
ditions, —the lengths of clam shells collected 
upon a certain beach. Are the recorded lengths 
the lengths of a single phenomenon? Is the se- 
lection of the successive shells made in a single 
manner? Are the shells clam shells? If the 
characteristics of two species, say clams and 
snails, are so different that there is no over- 
lapping in regard to some character, we can 
certainly avoid getting snail shells when col- 
lecting clam shells, but many a problem of prac- 
tical statistics is not of this sort. It may 
well be that some shells other than those origi- 
nally thought of or defined as clam shells may 
be included in the sample. In the last analysis 
the safeguard against this lies in someone's 
judgment that shell A , all things considered, is 
of the same species as shell H. We may call this 
crucial act the judgment of sameness . 

Next consider length, — what is the length of 
a clam shell? Here is one which is chipped at 
the end and we discard it, but here is one with 
reference to which it is hard to say whether it 
has been chipped or not, and we are again forced 
to depend upon a judgment of sameness. 

Now consider the selection of shells under 
similar conditions. Shell A is collected at a 
certain time and space, and this identical time 
and space does not provide us with another shell. 
Shell A is collected at high, and shell B at low 
tide. Are these collected in some single manner? 
Clearly no. All we can ever assert is that the 
items have been collected under conditions judged 
the same so far as factors that we judge material 
are concerned. Stated otherwise, there has been 
a judgment of irrelevance of those conditions 
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of sampling that have been observed to vary ftom 
item to item. Under optimum conditions we can 
assert that the collector's threshold of differ- 
ence recognition, as it pertains to the con- 
ditions of sampling, is not exceeded by the 
several experiences yielding the observations of 
his sample, but we may not claim more than this. 
The logical issues connected with sampling, 
probability, and truth have been thoroughly 
discussed by von Mises (1939). 

Mathematical statistics can sidestep all these 
issues by simply assuming that a true sample is 
at hand. This is never the case. To realize 
that applied statistics does not rest upon pure 
logic may be a disappointment to the reader, but 
it is because it does not that it concerns itself 
with phenomena, not noumena , and that it is 
adaptable to all the problems of life in their 
partly repetitive aspects, which are their chief 
aspects. All the fine-spun deductions of mathe- 
matical statistics are by some amount wide of 
the mark when applied to real phenomena. Logi- 
cally this must be so. They are astray by an 
amount which judgment alone bears witness to. 
How wide and what of it are questions that are 
very difficult of quantitative answer. As these 
difficulties are analyzed they regularly run 
back to the question of the soundness of some 
judgment of sameness or of relevance. 

Consider the last. The geologist finds a 
certain fossil at 10 a.m. and a second at 3 p.m. 
of a certain day. It may not occur to him that 
the time of finding is pertinent to any intrinsic 
characteristic of the fossils and he neglects 
mentioning the time. Whether overt or not, this 
is a judgment that the time is immaterial. 
The non-overt judgments of the novice fre- 
quently lead to insidious and serious error. I f 
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the first specimen is found in a gravel bed and 
the second in a clay bank, he undoubtedly would 
consider these space conditions as material and 
he would not report the items as of a single 
sample. The effective judgment may thus be 
either a positive or a negative act. Sampling 
errors due to this latter are very insidious and 
are committed by the man of narrow vision who 
blandly steps in where wiser men refrain to 
tread. 

The hazard of the positive act of judgment, 
which the collector of a sample makes, is of a 
very different and lesser sort. The judgment of 
relevance, if made consciously and defended, may 
be one of those judgments which mankind can make 
with considerable competence. Consider the 
following demands upon judgment, assuming no 
instrumental aids in making the judgments: 
(a) is A twice as tall as S? (b) is A 1.5 times 
as tall as 6? (c) is A 1.1 times as tall as B ? 
(d) is A. just as tall as S? The accuracy of the 
respective judgments is a matter of psychology, 
which would probably inform us that the accuracy 
of the sameness judgment is far greater than that 
of any of the other judgments. Certainly in the 
matter of pitch discrimination a very small 
difference, say that between 256 and 256.1 vi- 
brations a second, can be distinguished when the 
two sounds are equally loud and heard under 
equally favorable conditions and differ in time 
from each other by a very short interval. Here 
the query is, "Are the two the same?" and the 
answer can be given with a precision far exceed- 
ing the judgment that A is some multiple of B. 
Though we lack a general proof we may well be- 
lieve that the sameness judgment is the most pre- 
cise judgment within the power of the human mind 
to make. 

In view of this we need not be unduly dis- 
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turbed by the fact that the pertinence of the 
beautiful techniques of mathematical statistics 
to an actual problem are contingent upon some 
earlier act or acts of judgment. There is a real 
and important difference between mathematical 
statistics and the applied statistics with which 
the scientist is concerned in his attempt to 
discover or verify principles and laws underlying 
observable phenomena. 

There is a quadruple alliance inherent in 
sound statistical research: phenomena, i . e. , 
data; logic, i.e., mathematics; human psychology 
in its power to judge sameness; and human psy- 
chology in its power to appraise relevance. If 
this total activity is to be at its best, the 
data must be germinal, the mathematical elabo- 
ration sound and penetrating, the judgment of 
sameness made with reference to things that the 
mind has primary faculties for sensing, and the 
judgment of relevance made in the light of an 
imagination so rich that few possibilities arid 
connections are overlooked. 

We may not assert with reference to any data 
collected to throw light upon an unsolved issue 
that it merely awaits the expert to bring to 
fruit the important germ within. Though good 
judgment determined the data selected, it may 
still be barren. Also, prior to study, we cannot 
assert that any new data consequent to causes 
not known a priori, are void of virtue, but one 
large class does seem peculiarly sterile. In 
dealing with biological and social phenomena 
samples not composed of cases competitive with 
reference to the issues studied are, character- 
istically, barren of important relationships . 



10 


DIGNITY OF DATA 


section 2. HISTORICAL ANTECEDENTS OF 
MODERN STATISTICS 

Modern statistics is the outgrowth of a number 
of lines of social and scientific development. 
Because of this mixed ancestry, and because the 
convergence of these lines into a single disci- 
pline, statistics, is none too obvious, there 
are strong lines of cleavage in present proc- 
esses, and even in avowal as to the function of 
statistical method. Historically we find trade 
and vital statistics developing with the growth 
of business, particularly of international trade, 
and with national concern for welfare of citi- 
zens, their availability for the army, and their 
racial and vocational characteristics. The 
comprehensive work The History of Statistics 
(1918), edited by John Koren , is an excellent 
picture of this growth. The topics treated of 
in this work indicate the extent and detail 
reached through this line of development: Here 
are a few only of the topics falling under the 
first four letters of the alphabet under the 
general heading "United States." 


Accidents 

Agriculture census 
Banks, statistics of 
Birth statistics 
Census Office 
Children’s Bureau 
Church statistics 
Civil War Veterans 
Coal trade 
Commerce Commission, 
Interstate 

Commerce, Department of 
Commerce foreign 
Commerce internal 


Corporations 

Cotton 

Crime 

Crop Estimates 
Death statistics 
Defective classes 
Delinquency 
Divorce statistics 
Duties 
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Social phenomena have teemed with classifiable 
facts, and the student, whether tradesman or 
legislator, has realized that the careful col- 
lection of these facts, their assorting into 
independent or related categories, for successive 
intervals of time, was informative of the course 
of events, enabling an understanding and control 
of them not otherwise possible. This line of 
statistical development may be characterized as 
a response to the welling of social phenomena. 
With this as a starting point the technique of 
the accountant in recording transactions in money 
or things of monetary value is taken over and 
adapted to the recording of amounts in categories 
other than monetary. The good bookkeeper becomes 
the good statistician, and that is so. 

But this is only part of the story of modern 
statistics. let us turn to Studies in the History 
of Statistical Method , by Helen M. Walker, (1929). 
The prospectus of this book states that it "pre- 
sents the modern use of statistics against a 
background of the work of De Moivre, Bernoulli, 
Gauss, Laplace, Quetelet, Galton, Ebbinghaus, 
Fechner, and many others." Here we find differ- 
ent names, different topics, and a radically 
different emphasis, as witness the following 
entries occurring under the first four letters 
of the alphabet in the table of contents: 


Airy 

Anthropologists , 

statistical work of 
Association 
Astronomers, 

contribution to the 
theory of probability 
Attributes 
Average 

Bernoulli, Jacques 
Bernoulli, N. 

Bessel 

Binomial expansion 


Boas 

Bowley 

Bravais 

Charlier 

Chi-square 

Contingency 

Correlation 

Correlation , partial 

Correlation, multiple 

Curvilinear regression 

Differences 

Distribution 
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. A most elementary acquaintance with the names 
and topics here listed shows a vast difference 
in approach from tha t previously mentioned. 
Modern statistics is a very real fusing of varied 
antecedents which can be analyzed into much 
greater detail than into just the two particular 
lines mentioned, which we may call the social 
and the scientific backgrounds. In the scientific 
background we find developments in each of the 
sciences progressing more or less independently 
of each other. There is a statistics of eco- 
nomics, of life insurance, of biology, of psy- 
chology, of engineering, of physics and chemistry, 
of astronomy, and of mathematics. The develop- 
ments in each of these fields are continually 
running into related fields, but each has its 
taint, such that a trained statistician unfa- 
miliar with a given foreign language could almost 
certainly pick up a statistical work written in 
that language and allocate it to its appropriate 
field merely by observing the formulas upon the 
pages. 


section 3. OCCASIONS FOR RESORT TO STATISTICS 


differences in logical antecedents: A very 
fundamental difference depends upon an underlying 
difference in the logical issue that is set. We 
may say that there are two occasions for resort 
to statistical procedure, the one dominated by a 
desire to prove a hypothesis, and the other by a 
desire to invent one. This has led to distinct 
schools of statisticians, both lying within the 
general field of scientific endeavor. 


The first school is that represented by mathe- 
maticians who start with certain elementary 
principles and deduce therefrom facts of distri- 
bution, frequency, and relationship. In so far 
as observed situations parallel these conclusions 
the same elementary principles are supported as 
applying to the data in hand. One weakness of 
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this approach lies in the fact that a number of 
causes — different sets of elementary princi- 
ples — may result in substantially the same net 
result. A still greater weakness is that it is 
essentially a deductive procedure and relatively 
sterile in suggesting new causes — in inspiring 
creative. inferences. It is fundamentally a 
method of proof and not one of invention; and 
just because it is a method of proof, it has a 
permanent place in statistical method. It must, 
however, if in the service of the social and 
biological sciences, be but a handmaid to the 
creative genius of mathematical synthesis and 
induction. 

The second school is best represented by those 
biometricians and economists who start with ob- 
served data and endeavor so to group them and 
treat them that the constant features of the data 
are made apparent. This is a process of sta- 
tistical analysis. It may at times be expected 
to be an involved process, for social phenomena 
are complex. Data are frequently warped to fit 
statistical convenience, but if statistics is to 
realize its high destiny, procedure must be 
flexible, for only when the method is mobile can 
it fit immobile data. The accurate measurement 
of those features of phenomena which are ex- 
ceptional, but not chance, is the unique province 
of statistical analysis. 

The present treatment follows the second 
school, for the starting point of every problem 
is data, actual data and not hypothetical. When 
data are found to approach a hypothetical form, 
as for example at times normality of distri- 
bution, then the beauti ful relationships of the 
theoretical or ideal form are availed of to the 
utmost, but it is intended that the treatment 
will never lose the characteristic of its origin, 
and that an observed agreement with obj ective 
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phenomena will always be considered proof of 
soundness of the technique pursued. 

The method of approach in this text is in- 
ductive, starting with data and deriving con- 
stants, and will not give the noumenal satis- 
faction that comes from tossing coins, throwing 
dice, and sorting cards, thus obtaining distri- 
butions which approach an ideal standard. 

Psychological warrant for statistics: It is 
probably true that in the social experiment of 
living difficult procedures are a sort of last 
resort. If the issues of life can be solved by 
logic, why use any other method? If the issues 
can be solved by the nearly exact procedures of 
chemical or physical analysis, why go to the 
bother of dealing with many samples? If the 
biological laws of inheritance are given by 
knowable and unalterable laws of cause and effect, 
why investigate general tendencies of mating and 
of family relationship? In each instance there 
is no sufficient reason. The facts, however, 
seem to show that life does not reveal itself 
with such certainty. Even in chemistry and 
physics, only approximate answers to problems are 
given in exact and concise terms, — the inexact- 
ness in observed fulfillment being due not only 
to "impurity” of the data at hand* but also to 
the fact that logical and mathematical formu- 
lations are characteristically, probably uni- 
versally, but approximations to reality. 

Where the approximations are close and where 
the data are, or can by selection or experimental 
refinement be made, relatively pure, the adequacy 
of the "principle," the "law," and the "formula" 
is greatest and statistical techniques are in 
least demand. Technology well represents this 
field as does that upon which it is built, the 
"established" facts of science. However, in the 
process of establishing such facts a very differ- 
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ent attitude was present. The physicist observes 
seemingly irregular changes in X as Y changes. He 
repeats his experiment, controlling more and more 
of the conditions, and repeats again and again, 
and, if successful, reaches a law at the end of 
his work. He has been using statistics, though 
the engineer who utilizes his law does not. Sta- 
tistics are inherently connected with the formu- 
lative stages of knowledge. This is true with re- 
gard to statistics as accumulations of repeated 
observations, and as techniques for discovering 
the characteristic features of such accumulations. 
Statistics and experimental research are wedded. 
In the case of the physicist introducing one after 
another greater degree of control over his data, 
it is especially to be noted that the stimulus to 
do so has been consequent to antecedent statis- 
tics, i.e., variability in several or many obser- 
vations in which variability was not expected. In 
cutting down this variability through the greater 
control of conditions, he may say that he is 
avoiding statistical techniques, but the fact is 
that statistical considerations have been the 
very cause of these later steps. In one sense we 
may say that the main purpose of statistics has 
been to eliminate itself. This is a sound view. 
Statistics is a research instrument and drops out 
of the picture to the degree that the problem 
becomes solved. However, as long as undetermined 
variability in outcome is present, the statistical 
aspects of the situation remain. In this sense 
most of the important problems o f social life are 
never solved, but are only in the process of 
being solved. 

This can be adequately expressed in terms of 
the total and the part variances entering into a 
statistical situation. We will illustrate the 
meaning of variance with the 20 hypothetical 
temperatures at a single weather station , as 
listed in Table I A. 
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TABLE I A 


MATURES 

Deviations 

from Means 

Squares of 

Devi at ions 

Prod- 

ucts 

Pre- 

dicted 

Errors 

X 

e 

H 



x 2 

2 

e 2 

xe 

73 

i 

4 

3 

1 

16 

9 

I 

3 

74 

0 



0 

16 

16 

0 

0 

66 

0 

-4 

-4 

0 

16 

16 

0 

0 




n 

D 


B 

36 

9 

9 

9 





1 

0 

9 

9 

-9 


B 

B 


D 

* 

9 

I 

-3 


B 




(6 

9 

i 

3 


U 




16 

4 


** 


mm 

0 



0 


% 

-4 

67 

3 

0 

-3 

3 

0 

9 

9 

-9 

72 

0 

2 

2 

0 


4 

0 

0 

68 

2 

0 

-2 

2 

0 


** 

•4 

67 

-3 



-3 

36 

9 

9 

9 

67 

1 


-3 

1 

% 

9 

i 

-3 

70 

-2 


0 

-2 

$ 

; 0 

4 

0 

68 

-2 


-2 

-2 

16 

** 

j ** 

4 

70 

2 

2 

0 

2 


0 

| 4 

0 

68 

0 

-2 

-2 

0 



0 

0 

70 

2 





0 

4 

0 

70 

-2 




*1 

0 

4 

0 


B 

B 

0 

0 


128 

72 

0 

■ 

n 


0 

0 

10.0 

6.4 

3.6 

0 
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The predicted values are those made by the 
forecaster 24 hours earlier and based upon all the 
meteorological information available to him at 
that time. Thus X is that which is known about 
the situation and e is that which is unknown. 
For each item the actual temperature, X, equals 
the predicted temperature^ X, plus the error of 
prediction, e, i . e. , X = X + e. The same holds 
when dealing wi th _deviations from means, for 
x = X - M y and x = X - M, and e = e - 0, so that 
x = x + e. 

The quantity e, being the error in the pre- 
diction, is in the illustrative data of Table I A, 
as it universally must be, independent of the 
prediction x. Otherwise expressed, as later 
proven in connection with the derivation of^a 
correlation coefficient, the sum of the xe 
products is equal to zero. In this case the 
variance of the x* s (their mean square) plus the 
variance of the e ’ s exactly equals the variance 
of the x* s. We note that 10.0 = 6.4 + 3.6 and 

in general V x = + V e . This important property 

of divisibility into independent additive parts 
of this particular measure (the variance) of 
variability is not a property of any other measure 
of variability and is the basic reason why this 
measure facilitates thinking as does no other 
measure of variability. 

Let us choose such units that the variance in 
temperatures when no defining facts are given is 
1 (not 10 as in the data of Table I A). If the 
defining facts give such information that it is 
now known that the range of probable temperatures 
has a variance %, there is still much variability 
in the temperature to be expected 24 hours hence. 
The forecasting formula is inadequate to give pre- 
cise knowledge, so statistical issues of distri- 
bution of probable temperatures remain. They 
will continue to remain until forecasting is more 
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ac'curate and the person, whether meteorologist 
or layman, interested in this residual 50 per 
cent of the variance will continue indefinitely 
to be concerned with a statistical problem. 
Table I B may help to picture the field wherein 
statistics operates. 


TABLE I B 

SITUATIONS CLASSIFIED ON BASIS OF USE 
OF STATISTICS AND STATISTICAL METHODS 


In- 

Knowledge 

Residual 
. variance 

Resulting in Situations wherein 

1 L 1 Ol 

Vari- 

ance 

of factors 
determi- 
ning it 

after us- 
ing such 
knowl edge 

Statistics are 
commonly not 
used by 

’ Statistics are 
used by 

1.000 

Highly 

precise 

.001 

A. Chemists, 
engineers, 
and classes 
B, C, 0, 
and E- 

F. Research 
scientists in 
certain fields 
of astronomy, 
ph ysics, bu- 
reau of stan- 
dards, coast 
and geodetic 
survey, etc. 

1.00 

Excel 1 ent 

.05 

B. Technolo- 
gists, 
"practical " 
profession- 
al men, and 
classes C, 

D, and E. 

Class F and 

8. Research 
technologists, 
scientists in 
such fields 
as the 
physical 
sciences. 
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TABLE I B 


(CONTINUED) 


In- 

itial 

Vari- 

ance 

Know] edge 
of factors 
dete Trai- 
ning it 

Residual 
Variance 
after us- 
ing such 
Knowledae 

Resulting in Situations wherein* 

Statistics are 
coramonl y not 
used bv 

Statistics are 
used by 

1.00 

Very 

Useful 

1 

.20 

| 

C. Rule of 
thumb workers 
executives, 
men who "do 
things, " 
school men, 
and Classes 

D and E. 

Classes F, G, and 

H. Research work- 
ers in medi- 
cine, educa- 
tion, psy- 
chology, the 
biological 
sciences, and 
occasional ly 
in the social 
sciences. 

1.00 

Useful 

.50 

0. Promoters, 
uncritical 
people, and 
Class E. 

Classes F,G, H,and 
!. Research 
workers in the 
social sciences, 
the more careful 
executives, 
salesmen, guid- 
ance counselors 

LOO 

: 

Meaningful, 
but poor 

.80 

E. Semi-compe- 
tent^tudents" 
charl atans, 
and uncritical 
enthusiasts, 
gamblers who 
characteristi- 
cally "foot 
the bill." 

Classes F,G,H,i, & 

J. Fairly intell i- 
gent people in 
al 1 walks of 1 ife. 
Words indicative 
of such use are 
"probabtl ity, " 
"tendency, " 

"1 ikel i hood, " 
"usually, " 
"sometimes, " 
"generally, " etc. 
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' The primary field of statistics is represented 

by the middle portion of the table. When knowl- 
edge is nearly complete, and .999 of the variance 
can be accounted for, but few issues hinge upon 
the small fraction of the variance which is un- 
known. When knowledge is very inadequate, the 
entire field is avoided so far as possible by 
serious people, and especially by scientific 
workers, and is left in the hands of uncritical 
people. The occasion for statistical thinking is 
so common that the question is not whether one 
shall use it, but how fine a tool it is in his 
hands. This book aims merely to make explicit 
certain logical implications of phenomena, and in 
no sense to provide a tool that has not first 
been demanded by the requirements of clear think- 
ing, as applied to quantitative and qualitative 
facts of life. 

It should be obvious from Table IP that sta- 
tistical methods should serve well many of the 
Common needs of ordinary folk. However, their 
use and value in this connection is limited by 
understanding and not by the intrinsic fitness of 
the tool. In writing for a lay audience certain 
definite restrictions as to concepts and tech- 
niques employed are imposed. These may be very 
serious, making it well-nigh impossible to illus- 
trate crucial points. Such terms as variance and 
residual, used in preceding paragraphs, are 
taboo. Substitution for the word residual can be 
made, but no substitution for variance is possi- 
ble without first taking the time to teach quite 
a bit of statistics. As a consequence a large 
and important class of problems wherein a total 
variance is capable of analysis into its com- 
ponent parts i is beyond the scope of the under- 
standing of a popular audience. Undoubtedly 
attempts at popular presentation of this concept 
will be made, but we may expect them to be as 
unsatisfactory as the attempt to explain corre- 
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lation by means of such common concepts as per- 
centages of a total, proportion of agreement, 
and other elementary mathematical concepts, all 
of which give the impression of conveying infor- 
mation while actually leading astray. What does 
it mean to say that a correlation is 66 per cent 
of perfect, or such that there is 66 per cent 
agreement? Presentation of such concepts to lay 
audiences is fraught with misunderstanding. 

There remain many statistical concepts and 
techniques which may be employed in this case: 
There are available certain graphic methods, 
time charts, circle charts, growth curves, fre- 
quency polygons, means, ratios, percentages, 
limits, ranges, categories, frequencies in class- 
es, and still other concepts. Some of these have 
been sadly overworked, as for example the "ratio" 
in connection with X.Q.’s (intelligence Quo- 
tients), E. Q. 1 s (educational quotients), A.Q . 9 s 
(accomplishment quotients), etc., where concepts 
of growth, dispersion, and correlation are sta- 
tistically more neat and informative. It should 
not be assumed that with literary skill the most 
recondite concepts can be presented to any audi- 
ence of normal intelligence. This is scarcely 
so. Have the valient attempts to present "rela- 
tivity" in the Sunday supplements resulted in a 
nation informed upon this subject? The next 
three chapters are concerned with such elementary 
statistical techniques that they are serviceable 
in connection with popular presentation, but it 
must not be assumed that they reasonably circum- 
scribe the field of statistical effort of the 
practical school man, psychologist, or sociolo- 
gist. They merely serve as an introduction to 
quantitative thinking. 

Analysis of phenomena : Papers presented to 
scientific bodies may be concerned with the 
analysis of phenomena, but presentations to popu- 
lar audiences are prin,arily propaganda, — legiti- 
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mate enough if the statistical picture given is 
accurate, Hut clearly serving a noninvestigative 
and nonanalytical purpose. The subtler phases of 
statistics deal with analysis, not with descrip-* 
tion. It is the good fortune of propagandists 
that this is so. It is very desirable that such 
a one should not overstep the limitations of his 
technique. To present some simple time chart 
and, by noting its ups and downs, proceed to 
analyze it and explain underlying causes is chi- 
canery or wishful thinking. The analysis of a 
time chart calls for information far beyond that 
given in the chart itself. It is very easy for 
an investigator without dishonest intentions to 
(1) have a theory, (2) draw a chart, (3) see 
tendencies in it which are explicable in harmony 
with his theory, (4) think that the chart sup- 
ports his theory, (5) present it to an audience, 
and (6) deduce in the presence of the audience 
the theory as consequent to the phenomena of the 
chart. Procedures like this discredit statistics 
and have, in poorly informed quarters, led to 
the charge that one can prove anything by sta- 
tistics. 

There is but a single story inherent in a 
criven body of data and this can be approximated 
to by the delicate tool of statistical analysis. 
Its uniqueness is its outstanding" feature. To 
believe, or try to get, several stories out of 
it, or merely proof for some preconceived story, 
is denying its genius. A Turner might paint the 
Smoky Mountains in brilliant red and present a 
charming picture, but what a sad story this 
would be to one who knows them, and what a false 
guide to the scout seeking landmarks in territory 
new to him. 

The first function of statistics is to be 
purely descriptive f and its second function is 
to enable analysis in harmony with hypothesis , 
and its third function to suggest by the force 
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of its virgin data analyses not earlier thought 
of. In carrying out the first function we shall 
deal in forthcoming chapters with graphic por- 
trayal, phenomena of distribution of a single 
variable, and phenomena of relationships between 
two or more variables. All of this is elementary 
statistics. 

More advanced statistics concerns itself with 
such a study of given phenomena that invariant 
features of it are discovered, —they being there, 
of course, all the time, merely waiting an ap- 
propriate throwing aside of overlaid rubble to 
bring them to light. If a brilliant hypothesis 
has correctly conceived of some feature as being 
invariant, the testing out of the hypothesis is 
generally not difficult though transcending the 
simplicity of mere description. If no such hypo- 
thesis is present, there still is a possibility, 
by viewing the data now from this aspect, now 
from that, of discovering its stable features. 
If charged with painting a mountain, an artist 
might first ride around it on horseback to dis- 
cover "the" vantage point above all others. So 
the statistician with new material may search 
diligently and by devious methods to find the 
unique story in his data. All the interest of 
the detective, all the joys of the artist, may 
be combined in his search. 

T hough statistical analysis has much in common 
with pure mathematics, there is one important 
difference. The mathematician assumes his initial 
conditions and then searches out their necessary 
consequences, building a more and more beautiful 
and intricate mathematical structure. The sta- 
tistician starts not with an assumption, but with 
given data with their conditions denying arbitrary 
assumptions, and seeks to build a thought struc- 
ture in harmony with it, and to indicate conse- 
quences that are in harmony with it. If the sta- 
tistician could but first choose the nature of 
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his data and then ramify at heart’ s content and 
announce consequential relationships, more and 
more remote consequences could be pointed out. 
Put since given data are inflexible in nature, 
then as characteristics are observed and deduc- 
tions made, the more remote the deduction, the 
more uncertain its verity* Remote deductions are 
like the end of a whiplash. It may oscillate 
violently, though the whip handle move but slight- 
ly. The argument involving a long chain of 
consequential relationships which is so powerful 
in mathematical analysis can seldom be employed 
in statistical analysis. The slight initial 
error always to be expected in finite data may 
become augmented into a grievous falsehood by the 
time the end of the long argument has been reached. 
For sound statistical analyses the magnitude of 
errors in the initial starting point must he more 
or less accurately apprehended and their signifi- 
cance carried along through every step so that 
the final outcome of an analysis has attached to 
it a definite concept of probable error . The 
process of attaching to each step of a deductive 
argument a correlative step of probable error 
may be long and difficult, but it can scarcely 
be abridged. 

If now we turn to mathematics as a tool in 
statistics, we shall find it essential and of 
universal service. 

SECTION THE MATHEMATICAL BACKGROUND OF 
STUDENTS OF STATISTICS 

The common observation of teachers of sta- 
tistics that their students are ill-prepared in 
arithmetic as well as algebra arouses no great 
surprise in colleagues in other departments, be- 
cause every teacher is prone to blame the short- 
comings of his instruction upon the earlier 
training of his pupils. Nevertheless, the 
teacher of elementary statistics has surprising 
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ground for his charge. The writer’s experience 
with el ementary students has mainly been with 
those in education and psychology, but it has 
included a fair sprinkling of students of eco- 
nomics, biology, and mathematics. In terms of 
excellence of mathematical background he would 
rank them as follows: mathematics majors first, 
followed in order by biology, psychology, educa- 
tion, economics, and sociology majors. Of the 
education majors one subgroup, those specializing 
in English, have consistently shown the poorest 
mathematical equipment, and generally the least 
n flair” for statistical thinking. This ranking 
seems congruent with the presumptive interest and 
cultural antecedents of students majoring in 
these different fields. Frequently it is helpful 
to both student and teacher to realize these 
group differences. 

A Background Test, so called because it was 
used to measure the mathematical background of 
students entering courses in statistics, was 
given to 406 students entering, or early in, a 
course in elementary statistics, drawn from 9 
classes at the Universities of Columbia, Harvard, 
Illinois, Minnesota, Oregon, and Stanford. The 
courses mentioned were given in psychology and 
education departments so that if the writer’s 
ranking of the mathematical background of stu- 
dents according to major fields above is fair, 
the 406 students represent a sampling which is 
not below average. One form of the Background 
Test is printed in Appendix A. Students who are 
desirous of finding out their standing in this 
fundamental background material may take the test 
and score it themselves, and interpret it via the 
percentile norms, given in Appendix A, based on 
the 406 students. 

The idea of making a "good” score on a test 
has become so established that students will be 
sorely tempted to cheat themselves. Some of the 
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ways in which a taker of the test can do it are: 
(a) Bead test and scoring key in Appendix A be- 
fore starting the test. ( b) When taking the 
test, not rigidly adhere to time limits. (c) pe 
aware that he is failing because he does not 
quite understand directions. look up an answer 
or two just to get the drift of the thing. 
(d) Before taking the test talk it over with 
fellow students or instructor, (e) When scoring, 
credit himself with an answer which he intended 
to be the same as that called correct in the 
scoring kev , but which actually is a little 
different. ( f) Generally get rather disgusted 
with his dumb responses and give a little extra 
credit here and there. One who does any of these 
things while taking the test cannot have the 
satisfaction of an unbiased appraisal of himsel f 
in comparison with the run of college students 
starting a first course in statistics. 

A number of very interesting tendencies have 
come to light as a result of giving the Background 
Test. It would be appropriate to discuss them 
here, but this would assist students desirous of 
taking the test under the standard conditions 
imposed upon the 406 from whom percentile norms 
have been drawn up. 

Suffice it here to say that the sketchiness 
of the average entering student’s knowledge of 
arithmetic and elementary algebra would be ap- 
palling were it not true that even with such poor 
equipment students are still able to master some 
of the broad principles of statistical thinking 
and treatment of data. 

That, with appropriate background, accomplish- 
ment would be much greater than now possible has 
been forcibly voiced bv the Committee approved 
by the Advisory Committee (1933) of Social and 
Economic Research in Agriculture, making its re- 
port to the Social Science Research Council in 
1932, entitled Collegiate Mathematics Needed in 
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the Social Sciences . The purpose of the Committee 
was to determine what mathematics should be 
taught to students of the social sciences. As 
the Committee itself pointed out, its report has 
been largely concerned with the needs of students 
of economics. There is a slight overemphasis of 
mathematics connected with temporal series and 
index numbers in the report of the Committee, as 
judged by the needs of students of education and 
psychology, but, with slight modification, their 
statement is an excellent one as to these needs, 
and probably even as to the needs of biology 
students as well. They write: 

"The committee believes that a knowledge of 
the following phases of collegiate mathematics 
would be quite helpful and useful to a large 
proportion of the students in economics, and that 
many students in other social sciences would find 
it worth while to obtain the knowledge and train- 
ing that can be acquired through the study of 
these phases of mathematics, providing it can be 
done without taking too much time from the study 
of other subjects. 

"1. Logarithms: for understanding special 
plotting methods and forms of equations; for 
numerical computation in connection with curve 
fitting and elsewhere. 

"2. Graphs : (as a mathematical tool) repre- 
sentation of tabulated data on ordinary and loga- 
rithmic papers; measurement of slopes and areas 
(especially in connection with frequency and 
cumulative curves) ; graphical determination of 
maxima and minima; representation of such three- 
variable relationships as are encountered in 
statistical studies, by means of three-dimensional 
diagrams and contours. 

"3. Interpolation: by reading graphs (ordi- 
nary, semi -logarithmic, etc. ) ; by proportional 
parts; and possibly by successive di f ferences. 
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. "4. Equations and forms of important curves: 
straight lines; parabolic and hyperbolic curves; 
exponential and logari thmi c curves; logi s ti c 
curves; sine and cosine curves. The object 
shoul d be to make the student suf fi ci ently fa- 
miliar with the forms and characteristics of the 
simpler Curves to enable him to recogni ze readi ly 
the type of curve sui table for fi tting to any 
given body of data* and to appreciate the princi- 
pal implications involved in the use of such 
curves. 

"5. Probability: combinations, binomial theo- 
rem, elementary probability; the normal proba- 
bility curve, — its form, table, and equation; 
probability and frequency di stributions; proba- 
bility and time series. 

”6 . Elements of di fferenti al and integral 
calculus: the signi ficance of a differential as 
a limit, as a rate or slope, as a frequency, 
etc. ; parti al differentiation; formula for 
differentiating elementary mathematical func- 
tions; procedure for determining maximum and 
minimum values; integration as the reverse of 
di ffe rentiation; relation between integration and 
summation; multiple integration. 

”7. Curve fitting (mathematical principles): 
plotting tabulated values, their logarithms, re- 
ciprocals, etc. , to ascertain whether the tabu- 
lated points lie on a curve of simple form- 
determination of the curve through such points , 
or selected points within the scatter-diagram or 
chart ; the method of least squares or o ther 
methods for obtaining the constants of curves of 
best fit." 

The Commi ttee then points out that the back- 
ground recommended can ordinarily be gotten by 
pursuing no t less th an 15 or 20 semester hours 
of work in a college department of mathematics, 
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as courses are now ordinarily given, and the'y 
observe that this requirement is too heavy a 
demand upon economics majors, particularly in 
view of their belief that the desired mathematical 
training could be given in from 6 to 9 semester 
hours if mathematics courses were organized with 
this in view. The Committee recommends the organ- 
ization of such courses in departments of mathe- 
matics. The proposal is an excellent one, but 
until carried out what is the student of sta- 
tistics in any one of the social sciences to do? 
If, in all justice to his program, he cannot take 
15 or 20 hours of work in the mathematics depart- 
ment, what is the alternative? That must be de- 
cided by each student himself, but the writer 
recommends that the average student, desirous of 
mastering the elementary concepts of statistics 
and not expecting to major in it or to devote his 
life to research, first, take not less than a 
five-semester-hour course in college mathematics 
(a general comprehensive freshman course in 
college algebra, analytic geometry, etc., if 
available), and, second, that he do some serious 
reading upon the mathematical topics just cited 
to supplement his equipment. As a cue in this 
the student is particularly advised to remedy 
his deficiency as revealed in the Background Test 
of Appendix A. It may be stated that in con- 
structing this test a broad survey was made to 
determine what, are the elementary mathematical 
concepts demanded for the understanding of current 
research. Some of the relatively simple mathe- 
matical concepts found in the test are not heavily 
represented in the literature. For example, 

11 3/8 divided by 1.65, ora^a 6 - a 9 , whereas the 
more advanced concept (a + b) n , is a common and 

powerful instrument in statistics. In other 
words, the student should read with a purpose 
which is dictated by the needs of the problem of 
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the subject, not by previous organizations of 
knowledge. If he wishes to know how to plot 
y = sine of x, which upon recourse co references 
he finds discussed on page 300 of some text, it 
should not be necessary for him to read the pre- 
ceding 299 pages to secure an answer, but it may 
readily happen that to understand page 300 he 
will have to follow back a chain of pages, per- 
haps like this, 30G-299-298-24Q-241-242-15Q-84- 
a family of concepts known by the student. This 
type of reading in mathematics is economical of 
time and effort, and is to be highly recommended. 
That it can be followed in mathematics may not 
have occurred to students accustomed to think of 
the mathematics text as positively sequential, 
page by page. References could be given to in- 
numerable texts showing where problems of each of 
the types of those in Appendix A are discussed, 
but any college student should be able to locate 
such references himself, and he is expected to 
do so in connection with each problem that he has 
missed, if he has taken the test. His arith- 
metical and algebraic shortcomings , as revealed 
by the test, especially if so pronounced as to 
place him in the lower quarter, should be remedied 
before he starts Chapter VI, and his analytic 
geometry before starting Chapter IV. This state- 
ment asserts that the student is called upon to 
know certain principles of analytic geometry at 
an earlier stage than common principles of alge- 
bra. This is the case, and it is quite commonly 
so. The problems of social statistics have not 
been organized to fit the sequences of the writers 
of mathematics texts, or .the organizers of high 
school and college curricula in mathematics. We 
should be content with this situation and expect 
statistics to provide its own sequences in harmony 
with its own vista of social problems. An enumer- 
ation is here given of certain mathematical con- 
cepts which are important for statistics and 
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which fall under the headings of the report cited’: 

Logarithms: There are two important soxts, 

logarithms to the base 10 (here designated "log"), 
which are used in computational work, and loga- 
rithms to the base e (here designated "In") oc- 
curring in formulas and curve fitting. The loga- 
rithm of a number, say 15, to the base 10, is the 
power to which 10 must be raised to give 15: 

thus 10* = 15, where x is the required logarithm. 

The Napierian or natural logarithm is the ex- 
ponent of a magnitude called e which to the first 

seven decimal places = 2.7182818. Thus e y = 15, 
or 2.7182818 y - 15, where y is the natural loga- 
rithm of 15. From the usual table of logarithms 
we obtain 

log 15 = 1.176091 = x = logarithm to the base 10 
Employing a table of natural logarithms we obtain 
In 15 - 2.708050 = y - logarithm to the base e 
Either one of these logarithms might readily 
have been obtained, knowing the other, because 
of a constant ratio maintaining between them, 
logarithms to the base 10 may be gotten from 
logarithms to the base e by multiplication by a 
quantity M, which to the first nine figures 
equals .434294482, called the modulus of loga- 
rithms to the base 10. Thus log a = M In a. For 
example, 1.176091 = .434294482 (2.708050). 
Systems of logarithms in addition to these two 
will scarcely concern the elementary student , but 
he should be able to use either of these in simple 
problems. The meaning of terms "base," "modulus," 
"characteristic" (that part of a logarithm to the 
base 10 to the left of the decimal point), "man- 
tissa" (that part to the right of the decimal 
point), "logarithm," "anti-logarithm" (the number 
corresponding to the logarithm) , "co-logarithm" 
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(the logarithm of the reciprocal of the number; 
should be known, for there is commonly a peculiar 
fitness in logarithmic presentation and interpre- 
tation of ratios and indexes and of quantities 
having a natural zero point, as, for example, 
the price of a commodity or a wage. Where a zero 
point is not known to exist, or where it is un- 
important, or is poorly defined, logarithmic 
treatment is seldom in order. These brief obser- 
vations, as well as those of succeeding para- 
graphs, are not intended to instruct the reader 
in mathematical background, but to make him aware 
of the elementary phases of the subject with 
which he should become acquainted, if not already 
a master of them. 

Graphs: So common a type of curve as that 

shown in Chart I - I , which the mathematically 
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uninformed reader thinks he understands, takes 
on richness and niceties of meaning when one has 
a mental picture of the graph of such functions 
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as the parabola y - fox 2 , the exponential y = ae b *, 

the Gomperz curve y - a , the simple logistic 

curve y = or the growth- senescence 

1 + be~ cx 

curve y = . Simple graphic portrayal 

e ’ bx + e cx 

of data upon two-dimensional paper can commonly 
be made without any knowledge of the shape and 
properties of mathematical curves, but an exami- 
nation of graphs without such knowledge is rela- 
tively sterile in suggesting relationships. A 
knowledge of logistic and more involved curves 
is not expected of elementary students of sta- 
tistics, but they should, prior to statistical 
instruction, be acquainted with such elementary 
concepts as the straight line and its represen- 
tation by an equation, the elementary forms of 

the second degree equation y - ax 2 , yx = a, 
x 2 + y 2 = a 2 , the simple exponential y = a * , and 

they should soon become acquainted with more 
complex curves. 

I nterpol at ion : The veriest tyro is called 
upon to interpolate when using tabled data. The 
subject is of increasing importance as more re- 
fined work is done. At the initial stage the 
student should be able to use correctly linear 
interpolation, that based upon first order differ- 
ences, in connection with any tabled data. An 
illustration and definition of certain terms will 
be made in connection with Table 1 C. This is 
the common arrangement of a table of logarithms. 
The primary use of the table is in getting loga- 
rithms, knowing numbers, and its secondary use 
is getting numbers, knowing logarithms. Based 
upon its primary use, the means of entering the 
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TABLE I C 

LOGARITHMS TO THE BASE 10 


HUMBER 


LOGARITHM 


! 

2 

3 

4 

5 

6 

7 

8 
9 

10 


-0000 
.3010 
.4771 
. 6021 
- 6990 
.7782 
.8451 
.9031 
. 9542 
1.0000 


table, called the argument, is a number, and the 
value gotten out of the table, called the conse- 
quent, is a logarithm. The logarithm of 5 is 
.6990, or, in general terms, we enter with the 
argument and obtain the consequent. A symbol 
which will be used in this text to indicate 
a one-to-one relationship between the measures 
of a first series and those of a second is , 
which is to be read "is equivalent to," — the 
equivalence being in terms of a given principle 
of relationship. Thus the statement 5 =£= .6990 
means that, in terms of the law of relationship 
given by the table, that is, by the equation 
y - log x, the magnitude 5, when otherwise ex- 
pressed, is the magnitude .6990. 

The same table may be used to obtain a number 
corresponding to a logarithm. When so used the 
entries in the second column become the argument 
and the entries in the first column the conse- 
quent. To find equivalents co rresponding to a 
given magnitude commonly calls for interpolation. 
Thus, to find the logarithm of 3.5 we go through 
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this computation: 

(.6021 - .4771) + .4771 = .5396 

or this computation: 

.5 (.6021) + .5 (.4771) = .5396 

This process is linear interpolation. It is the 
simplest kind. There is ordinarily an error in 
the answer obtained. In this case the error is 
-.0045, because the correct answer gotten from a 
more detailed table is .5441. We may proceed 
similarly, as the reader may verify, to find the 
anti- logarithm of .5540, that is, the number 
whose logarithm is .5540. The answer is 3.6152 
which, from a more detailed table, is found to 
be in error by .0342. 

Arithmetically exact linear interpolation 
should be within the power of the student before 
proceeding further, but parabolic and other im- 
proved methods involving second and higher order 
of differences are not expected of him at this 
time. The reading of a chart or graph with con- 
siderable accuracy by means of visual inter- 
polation should also be within his power. 

The symbol =0= will be found serviceable in 
many ways, for example, to indicate scores on 
two tests which are comparable, or equivalent. 
Thus, if the fifth-grade mean score on the Abb 
spelling test is 48.7 and on the Baa spelling 
test is 17.0, these scores represent equivalent 
degrees of excellence and we may write 

Abb 48.7 =0 Baa 17.0 

It may happen that the consequent for an argu- 
ment outside the limits of a given table is 
desired. It can be gotten by a process of extra- 
polation, but, in general, the process is subject 
to considerable error. For example, suppose we 
have given Table I C, 
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TABLE I C 

LOGARITHMS TO THE BASE 


10 


NUMBER 

1 

2 

3 

4 

5 

6 


LOGARITHM 

. 0000 
.3010 
.4771 
. 6021 
. 6990 
.7782 


and we desire the logarithm of the number 7. We 
proceed thus 

l ': ll ttm ( - 7782 " - 6990) + - 6 "° = * 8574 

This is in error by .0123. If the distance extra- 
polated is small, the error may not be great, 
but at best it is more hazardous extrapolating 
than interpolating, and effort should be made to 
avoid the necessity of doing so. The more refined 
methods greatly improve the accuracy of inter- 
polation, but they may not improve extrapolation 
at all. 

Equations and forms of important curves: The 
student would do well to plot one or more of each 
of the types listed by the committee and preserve 
for purposes of reference. The most difficult 
one to plot is the logistic curve, which calls 
for the use of logarithms or, most simply, of a 
"log log" slide rule. 

In addition to the curves mentioned, the 
following are particularly important to the stu- 
dent of education or psychology: the normal 
curve; the integral of the norjmal curve, which 
looks much like the logistic curve, and, along 
with other curves having the characteristic of a 
single inflection, is called an ogive curve; 
peaked, flat-topped, asymetrical, and bi~modal 
curves, the equations of which are somewhat 
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involved and beyond the scope of an elementary 
treatment. Even so, the shapes of these curves 
and knowledge that concise mathematical state- 
ments of each of them are available are valuable 
facts to know. 

Probability: First in this connection is 
knowledge of the binomial expansions (a + b ) n . 

This is closely associated with the concept of 
permutations and combinations, and with Poisson* s 
series, point distributions, the normal distri- 
butions and with certain skewed distributions. 
However, most of these topics are appropriate 
topics for a second course in statistics, so 
that the beginning student is fairly prepared 
if he has a knowledge of the binomial theorem 
and can handle a few simple problems in combin- 
ations and permutations. 

Differential and integral calculus: An ele- 
mentary knowledge of calculus is of such vast 
importance to statistics that beginning students 
interested in a second or third course in the 
subject should take an elementary course in 
calculus, but without such it is to be expected 
that he would be able to handle a first course 
in statistics. 

Curve fitting and the met hod of 1 east squares: 
These topics are equally appropriate to mathe- 
matics, engineering, statistics, and certain 
other fields, but even an elementary knowledge 
of it is hardly to be expected antecedent to 
entering a first course in statistics. In fact, 
it might well be the legitimate subject matter 
of a second course, and not a prerequisite to it. 

Other topics not specifically mentioned in 
the report of the committee: The propagation 
of error in the terms of a chain of operations. 
This issue has to do with the number of figures 
which are significant in a sum, a product, a 
quotient, etc. , when the number of signi ficant 
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figures entering into the parts is known. The 
subject, though fundamental to quanti tative study 
in every scientific field, is seldom treated of 
in arithmetics and algebras. Accordingly, a 
brief treatment of the simpler phases of it is 
given herewith. 

In pure mathematics "significant figures" 
refer to those that are quantitatively meaningful 
and not merely determinative of the decimal point. 
Thus in the three following, .7568, 75.68, . 007568 , 
the number of significant figures is 4 in each 
instance . In the case of .007568 (which many 

scientists would prefer to express 7.568 X 1Q~ 3 ) 

the first two zeros merely serve to place the 
decimal point and are not "significant." To 
write ' 756800, so that the reader will know the 
last two zeros merely place the decimal point 
and are not themselves significant, it should be 

given as 7568 X 10 2 or as 7. 568 X 10 5 . If written 

756800, it is a figure of six significant figures. 
Thus in mathematics the number of significant 
figures in a recorded value is the number of 
integers recorded, not counting zeros to the 
immediate right of the decimal point in the case 
of numbers less than one. This meaning is useful 
in statistics but another meaning of the term is 
also important. 

When any measurement is taken the number of 
figures or decimal places to which the measure- 
ment is expressed should be limited by the accu- 
racy of the means of measurement employed. If 
a wooden meter stick is used by a palsied man to 
measure the length of a live angleworm, and the 
answer recorded as 7.64381 centimeters, it is 
obvious that most of the figures put down are 
not "significant," that is, not trustworthy. 
p robablv the 7 is significant, and the 6 may be 
better than a guess, but the 4, 3, 8, and 1 are 
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not entitled to space and attention given them. 
In fact, they should not appear as a part of the 
measurement. The measurer may have looked upon 
the 4, 3, 8, and 1 as representing the best that 
he could do, but that is not sufficient to warrant 
recording them. If he felt sure of the 7 and 
only somewhat doubtful of the 6, it would be 
appropriate to record the length as 7.6 centi- 
meters. In this case, which is intended to be 
typical of published data, the last figure re- 
corded is admittedly somewhat in error, but not 
so much in error as to be no better than a guess, 
and the figure preceding the last figure is sup- 
posed not to be in error at all, except only as 
there may be a small variation of a tenth carried 
over from the error in the figure to the right. 
Counting decimal v laces from left to right , we 
may lay down the rule to record into the figure 
first thought to be in error by not more than 5. 
Thus, if the length be recorded as 7. 6, the reader 
is to understand that the author considers the 
error in the tenths place to be not greater than 
5 in this place, and not less than 1/2, for, if 
less than 1/2, the recording would have been 
7.64, suggesting an error not greater than 5 in 
the one-hundredths place. I et us write the length 

thus, 7.64381, meaning thereby that there is an 

expected error of not more than 5 nor less than 
1/2 in the figure under the star, so that of 
course the 4, 3, 8, and 1 are meaningless. 

So much for original data. %w what happens 
when numerical magnitudes having errors are added, 
multiplied, and divided by other magnitudes with 
errors. Consider 

7.64381 + 8.47653 = 16.12034 

Over which figure does the star belong in this 
answer? Consider also the average 
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7.64381 + 8.47653 
2.00000001)5 


8.0617 


Which figure should be starred in this answer? 

* 

Clearly the first answer is 16.1, and the second 
* 

8 . 1 . 

Consider the case 


* * 
7.6 + 8.4 

2 


8.0 


There is an error of unknown amount but presuma- 
bly not greater than 5 nor less than 1/2 in the 
tenths place in each addend. To make the problem 
specific let us say that the error is 1 in this 
place and that we do not know whether it is posi- 
tive or negative. Then 7.6 is in error by .1 
and the correct value may be written (7.61.1). 
Similarly the correct value for the second addend 
is (8.41.1), and we have 


(7.61.1) + (8.41.1) 


-8.1, or 8.0, or 8.0, or 7.9 


depending upon which signs are taken. The value 
8.0 [given by (7.6 + 8.4) / 2] is thus in error 
by “.1, or .0, or .0, or .1. The average of 
these errors algebraically is 0, and the average 
of them irrespective of sign is .05 [given by 
(.1 + .0 + .0 1 .1) / 4] which is seen to be less 
than the errors in the measures separately, which 
was .1. If a positive error in the first addend 
goes with a positive error in the second, or a 
negative error in the first with a negative error 
in the second, then the error in the average is 
of the same order of magnitude as it is in the 
addends separately, but if this is not the case, 
the error in the average is of a less order than 
in the addends separately. 

If the errors in the addends are "random" or 
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"chance," there is a much less error in the mean 
than in the measures singly. The specific nature 
of this decrease in error is shown in Chapter VI 
where the standard error of the mean is derived. 
As there proven, the order or size of an error 

in a wean is — X the order of size of the 

/IT 

errors in the approximate Jy equally reliable 
addends, where N is the number of addends yield- 
ing the mean in question . 

It is left as an exercise for the reader to 
demonstrate that the error in a difference be- 
tween two numbers has the same limits as to size 
as that in their sum. In this instance there is 
a cumulative effect if a negative error in the 
minuend tends to be associated with a positive 
error in the subtrahend, or vice versa. 

With this discussion as a background, we may 
now lay down the principle that the prime con- 
sideration in the collection of data to be used 
in determining means, or sums, or differences, 
is that they be not subject to "systematic" 
error, i . e. , all prone to errors of the same 
sign. A very elementary illustration would be 
the measuring of the height of a certain age- 
group of boys with shoes on, and reporting the 
average height for the use of people unaware of 
the fact that height as reported included shoe 
heel. We would expect so obvious an error as 
this to be caught by anyone, but many equally 
systematic, though not equally obvious, errors 
enter into economic, psychological, and other 
nonphysical measurements, ordinarily of great 
importance to social scientists. The student of 
statistics should school himself to he sensi tive 
to possible sources of systematic error. 

Further discussion of this is given in Chapters 
VI and VII in connection with the standard error 
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and the probable error of the mean. 

If two numbers having errors are multiplied, 
where is the error in the product? This question 
may be answered more definitely than that con- 
cerned with a sum or an average. Consider 

7.6 X 8.48 = 64.448 

To make the problem concrete and at the same time 
quite typical, let us consider the error to be 
1 in the place starred, but we do not know whether 
the error is positive or negative. Thus the 
correct factors and the correct product are 
given by 

(7. 6±. 1) (8 . 48±. 01) = 65.373 or 65.219 or 63.675 

or 63.525 

whereas the obtained answer is 64.448, so that 
the error is one of the following: -.925 or 
-.771 or .773 or .923. The proportionate error 

in the 7.6 is or .01316. In the 8.48 it is 
7.6 

.00118, and in the product it is, regardless of 
sign, .01435, or .01196, or .01199, or .01432. 
The average of these, regardless of sign, is 
.01315, which is approximately the same as the 
larger proportionate error in the two factors. 
In fact, we may in general consider the pro- 
portionate error in a product as of the same 
order of magnitude as that in the factor having 
the larger proportionate error . 

Consider the two problems 

316 X .315 = 99.540 
316 X .317 = 100.172 

To publish the first product as 99.5 may over- 
state its accuracy and to publish the second as 
100. may understate its accuracy. The precise 
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number of figures to publish is only accurately 
determinable by the aid of more information than 
given and than, in fact, is likely to be avail- 
able, but as a first approximation we may adopt 
the rule: Publish a product to as many signifi- 
cant figures as in the least reliable factor . 
Puhl i sh a quo tient to as many significant figures 
as. in the least reliable term . The number of 
places carried in computation should be at least 
one beyond the number to be published, and if a 
long chain of operations is involved, it may 
occasionally be wise to carry computations to 
two places beyond that demanded for publication. 

It is suggested that the reader investigate 
by cut and try methods what modification or re- 
finement of this statement is needed to make it 
more accurate. As examples, when the proportion- 
ate errors are the same in the two factors, 
consider at least two cases: (a) when the error 
is not in the first significant figure, and (b) 
when it is. 

The novice in experimental investigation, 
combining three statistics such as the following 

103 X .034 X 1.200 = 4.2024 

to obtain a final outcome, is likely to be un- 
critical in the distribution of time and effort 
which he puts upon the various aspects of his 
problem. Suppose the terminal statistics equals 
* 

4.2024. Suppose .034 is gotten by using an 

instrument, for example a psychological test, 

provided by some earlier worker, and 103 and 1200 

by using newly devised instruments. Refinement 

. . * * 
m these, leading to measures 103 and 1200, has 

been accomplished at great labor and it is futile 

* 

in view of the error inherent in .034. That a 
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chain is as strong as Its weakest link has been 
overlooked time and again by semi-competent sta- 
tisticians. 

On the other hand, in many statistical problems 
analogy with a chain is unsound. Thus, charac- 
teristically,. the mean is not as unreliable as 
the most unreliable measure which has entered it. 
Here the analogy may be to a rope of many strands, 
whose strength is much greater than its weakest 
strand. 

It is sound practice so to report quantitive 
results that a figure based upon observations 
such as, say, 1.743, carries with it the meaning 
that the 3, but not the 4, may be expected to be 
in error. Suppose a finally computed statistic 
is 174328, with an expected error in the last 
three figures. Then the result should be reported 

in some such manner as 1.743 X. I® 5 • The mere 

fact that the decimal point happens to be one or 
more places to the right of the last significant 
figure does not constitute warrent for publishing 
the meaningless figures, such as are the 2 and 8. 
An investigator should think of a computed sta- 
tistic as in three parts, thus: 174(3 | 28 , the 
first nearly indubitably trustworthy, the third 
substantially meaningless, and the middle part, 
of one digit only, of questionable significance, 
and he should publish the first two parts only. 

In a formula involving several terms, some of 
which are much less reliable than others, there 
is generally little point in keeping the more 
reliable figures to as many places as they are 
significant, for the error in the outcome is 
commonly determined by the error in the factor 
having the largest error. When addition and 
subtraction are involved, the largest absolute 
error in the parts is to be considered, and when 
multiplication and division are involved the 
largest proportionate error is the error of 
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importance. A thorough discussion of this matter 
is to be found in Scarborough's Numerical Mathe- 
matical Analysis , (1930) and an excellent dis- 
cussion, with numerical illustrations, is given 
in Walker's Mathematics Essential for Elementary 
Statistics , (1934) . 

And finally, as an essential part of the back- 
ground of the student of statistics, should be 
mentioned computational accuracy and knowledge 
of methods of check ing the same . Many a student 
lacks computational accuracy in simple arith- 
metical processes, and few have the habit of 
checking a computation by performing it in a 
second and independent way. The idea that the 
need for such a habit has passed with the advent 
of computing machines is stultifying. The writer 
has found no operation in greater need of checking 
by the ordinary student than that of extraction 
of square root . The habit of proving, by squaring, 
the root found is simple and very effective. 

We may assert that there is need for a proper 
psychological background. One's processes in 
arithmetic and algebra have commonly been built 
up through the vehicle of abstract numbers. Many 
of the magnitudes entering into a statistical 
study are not of this sort, while others are. 

For example, in y = 3a 2 x b it might be that 3, 2, 

h are abstract quantities, not subject to rede- 
termination or experimental change, operating 
upon the other measures and enabling them to be 
expressed in an equivalent form y. If so, the 
abstract number concept with reference to these 
is appropriate. The magnitude a might be a sta- 
tistical concept, earlier determined by oneself 
or by another investigator, and not intended to 
be changed in this experiment, but nevertheless 
an experimental constant and not of the nature 
of a pure abstract number. The magnitude x may 
be the crucial observation of this experiment. 
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As it is operated upon, every derivative from it 
retains the stigma of its origin. If the obser- 
vation x has a meaning and if the final function 
of it y has a meaning, then every intermediate 
function has a definite physical meaning, and the 
proper psychological attitude of the student is 
to trace this out step by step. It is question- 
able whether it ever happens that the meaningful- 
ness of the outcome is accurately and fully ap- 
praised if the meaning of the successive steps in 
the process has been obscure. Certain magnitudes 
that the statistician is working with are living 
and vibrant in a sense that the abstract symbols 
are not. The 3, 2, b, and a provide the form of 
expression but the life that flows through is the 
x, and it is this life that gives primary meaning 
to each successive function that is dealt with. 
It is a social blood stream flowing through alge- 
braic arteries. 

Plan of study: The table of contents provides 
the plan of this text. The first ten chapters 
constitute an integrated sequence built upon the 
background as indicated in the preceding para- 
graphs. The later chapters provide occasional 
advanced techniques. In general, in these chap- 
ters, citations to sources supplant full deri- 
vations. This text may be used as a basis for an 
advanced course in statistics, especially by pur- 
suing the references and developing the techniques 
of the later chapters. The student studying it 
as a first course should realize that the early 
chapters give the first steps in many sequences. 
Following upon elementary graphic work of this 
text is graphic analysis; upon a study of means 
and standard deviations is a study of other de- 
scriptive features of a distribution; upon simple 
correlation of quantitative measures is corre- 
lation between two variables expressed in cate- 
gorical and ordered series, and correlation issues 
involving more than two variables; following the 
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normal distribution in one variable is the normal- 
correlation surface; following linear regression, 
which is equivalent to fitting a straight line, 
is curve fitting of all sorts; and paralleling an 
algebraic interpretation, running throughout all 
the preceding, is a geometric and trigonometric 
interpretation involving more than three di - 
mensions . A member of an elementary class in 
statistics made a most stupid remark, unless per- 
chance it was a bit of wry humor. He said, "The 
work of this course gives practically all the 
statistics that a schoolman could ever use, 
doesn’t it?" He might as well have observed that 
only little care in thinking is useful if one is 
engaged in a small business. 

An early step in a statistical problem is the 
collection of data. This problem has been treated 
of in general terms (see Keynes, 1921) and, of 
course, in detail in connection with many special 
problems, particularly those of economics (see 
Zizek, 1913 and Young, 1925). Because of its 
almost infinite ramifications, depending upon the 
purpose and amenability to control of the phe- 
nomena under observation, the process is not 
seriously presented in this text. Its study is 
most appropriate after the purpose of an investi- 
gation is clearly defined, and after physical 
limitations of time and expense are known. 

A number of terms have been used in this chap- 
ter which have technical statistical or mathe- 
matical meaning. Since this section is concerned 
with the appropriate background for a study of 
statistics, these, together with certain other 
equally elementary statistical terms and symbols, 
are defined and criticized herewith. Additional 
terms will be similarly presented in subsequent 
chapters. The arrangement is alphabetic except 
for such closely related terms as call for defi- 
nition together. The explanations of terms here 
given are intended to be accurate as far as they 
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go, but not necessarily complete because it is 
thought that definitions so rigorous as to include 
the "fields of meaning of higher mathematics would 
be confusing to the elementary student. 

absolute value : the value of a magnitude taken 
positively, thus the absolute value of -7 is 7, 
and the absolute value of x is ( -x) , if x itself 
is negative. Bars enclosing a quantity indicate 
absolute value, thus: |x|, or I “7 | . 

abscissa : see coordinates . 

arbitrary origin: see coordinates . 

argument: the value with which a table is 
entered when a related value, called the conse- 
quent, or tabled value, is sought. If one finds, 
from a table, the logarithm of 5, the number 5 is 
the argument, and the logarithm, namely .69897, is 
the consequent. 

axis: a straight line in a figure with refer- 
ence to which the figure is symmetrical , or one of 
the axes of reference. See coordinates . 

biometry: the application of measurement to 
living things. 

correlation: the tendency of paired measures 
to vary concomitantly. This term is further de- 
fined in Chapter X. 

crude score : see score . 

consequent: see argument . 

constant: see literal notation. 

coordinates: there are two kinds commonly 
used in connection with two-dimensional repre- 
sentation, rectangular and polar. Rectangular 
coordinates consist of two lines at right angles, 
the horizontal axis, or abscissa, and the vertical 
axis, or ordinate . The intersection of the axes 
is the origin, or zero point, from which measure- 
ment is taken. Two variables related by some 
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equation or law may be represented, one by dis- 
tance in the direction of the abscissa , and the 
other by distance in the direction of the ordi- 
nate. Paired values define a single point on the 
two-dimensional surface. The connection of all 
such points yields a curve (still called a curve 
though a straight line or composed of straight- 
line segments). If this curve is such that, for 
each value of the first variable, there is one 
and only one value of the second, the first vari- 
able is a single valued function of the second. 
If at the same time the second is a single valued 
function of the first, then the curve is a mono- 
tonic curve, — one that is always rising or always 
falling. The singular "coordinate” refers to 
distance in either one of the two dimensions. 
The plural "coordinates" refers either to the two 
coordinates for some point or to the two lines 
at right angles through the origin, which are 
also called the axes of reference. If the point 
from which measurements are taken is chosen 
for convenience, or if it has no special physical 
significance, the origin is called an arbitrary 
origin . Polar coordinates define a point in a 
space of two dimensions by distance from an origin 
and angular distance from a given line. This 
system is less commonly employed in elementary 
statistical work than rectangular coordinates. 

curve: see coordinates . 

degree: in trigonometry, one three-hundred- 

and-sixtieth of the circumference. The degree of 
an equation is the highest sum of the exponents 
of any term of an equation, thus, 4x + y = 7 is 

of the first degree, 4x 2 + y = 7, or Ax + xy - 7, 
of the second degree, etc. See dimension . 

dimension: a dimension of an equation or of a 
term of an equation is the same as the degree of 

it. Thus xy3 z 2 is of the sixth degree or sixth 
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dimension. See degree. 

di stance: commonly used to indicate not 
merely spatial dimension, but also dimension in 
any quantitative function. Thus we may speak of 
the distance one score is from another, though 
the function may be intelligence, reaction time, 
or whatnot. 

equivalent: equivalent measures are those 
expressing the same thought in different terms, 
thus 39.37 inches are equivalent to 100 centi- 
meters. See page 32. 

evaluation: see formulation . 

factor: in addition to the ordinary meaning 
that a factor is one of the quantities entering 
into a product, the term is used quite extensive- 
ly to indicate one of the components entering into 
a sum, particularly if the components making up 
the sum are uncorrelated with each other. 

function: this is an aristocrat of mathe- 
matical terms. It is indifferent to such lesser 
terms as variable, constant, exponent, such oper- 
ations as multiplication or addition, and such 
specific relationships as subordination, corre- 
lation, association, though all of these serve it 
and give it specific meaning. To say that y is a 
function of x asserts that operating in some 
manner, which he who knows may designate, upon x 
yields y, and furthermore that y is gotten in no 
other way. Distinctions which are commonly im- 
portant for statistics between functions may be 
found in whether the function is multiple valued , 
single valued , or mono tonic , as explained in a 
preceding paragraph in connection with curves 
plotted upon rectangular coordinates. 

formulation: the determination of the specific 
nature of a function. F valuation is the process 
of finding the numerical value of y which is a 
function of x, for a given value of x. 
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imaginary number: in mathematics the square 
root of a negative quantity is called an imaginary 
number, and is contrasted with real numbers, those 
having finite values between ~ 00 and + In ad- 
dition to quantities imaginary in this respect, 
there are in statistics values which are " im- 
possible” or "unreal n , such as a product-moment 
correlation coefficient greater than one, a re- 
liability coefficient less than zero, or a nega- 
tive standard deviation . To avoid confusion , 
these latter would better not be designated im- 
aginary, however much they are a matter of fiction. 

increment: a term used ordinarily to indicate 
a small value or a small change in value and quite 
commonly represented by 8 or A. Such a small 
increment approaches the di f ferentials of calculus 
as it becomes smaller and smaller. 

invariant: this highly useful concept, wher- 
ever employed , is peculiarly valuable in con- 
nection with mathematical and statistical prob- 
lems. If the vertical, the north-south, and the 
east-west dimensions of an egg are taken as it 
is held in various positions , there result a 
great many values , all different from each other. 
These dimensions are variable as transforma- 
tions , — changes in position , — are made . (This 
geometric explanation of ,f transformation" is in 
harmony with the algebraic explanation given 
under the term transformation . ) I f the ratio 
of the longest diameter to the shortest diame- 
ter is computed for each position, it remains 
the same and is accordingly invariant under the 
transformations . These transformations may be 
considered to be merely different ways of viewing 
the same thing. If a thing is chameleonlike and 
looks different as each point of view is taken, 
there is no virtue in it, but if some aspect of 
it can be selected such that it is the same no 
matter the point of view (in mathematical terms, 
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no matter the axes of reference), it becomes a 
thing in which there is no shadow of turning, and 
in which truth resides. The discovery of the in- 
variants in phenomena is the main concern of sta- 
tistical and experimental science. 

literal notation: three sorts of elements 
enter into algebraic expression: (a) symbols of 
operation, or of relationship, (b) symbols indi- 
cating invariable features of the equation, that 
is, constants , and (c) symbols indicating vari- 
able features, that is, variables . A large class 
falling under (h) consists of the ordinary cardi- 
nal numbers. With quite a number of exceptions, 
it may be said that in this text letters near the 
end of the alphabet, particularly x and y, will 
represent variable quantities, letters near the 
beginning of the alphabet, particularly a and b, 
will represent constants, or invariable quanti- 
ties, and symbols and Greek letters will repre- 
sent operations. For more detail see the para- 
graph following upon symbols . 

maximum and minimum: the meaning of the terms 
will be obvious to the reader, but their im- 
portance in statistical work is so great that 
they should receive definite obeisance whenever 
met in a statistical study. To assert that one’s 
purpose is to get the maximum of enjoyment out of 
life is a literary pleasantry, but to say that 
the object is to make errors minimal should hold 
the attention and vivify the interest until the 
means for accomplishing this become clear. Do 
not slip over these key words to understanding, 
these cues warning one of a chance to observe a 
bit of intellectual cleverness. The neatness of 
mathematics is incorporated into statistics 
through these concepts. 

mono tonic: see coordinates # 

observed value: this describes the value of 
a variable in distinction (a) from the value esti- 
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mated from a knowledge of one or more correlated 
values, or (b) from an unknown true value. 

origin: see coordinates . 

orthogonal: see space . 

percentage: a percentage is 100 times an 

equivalent proportion. If 5 out of 20 belong in 
a certain group, the proportion in this class is 
.25, and the percentage is 25. The subtlety of 
this relationship is not so great as to provide 
an excuse for the common misuse of these terms. 

proportion: see percentage . 

real number: see imaginary number . 

score: a quantitative or qualitative record- 

ing of an observation. 

space: in addition to the familiar variety, 

the student of statistics is frequently concerned 
with an n-space, or an n-dimensional space, where 
n is greater than 3, because he can so readily 
represent four or more variables each in an inde- 
pendent direction ( orthogonal or at right-angles 
to each other) in this space. The mental picture 
of four lines through a single origin, each at 
right angles to the other, is beyond one, and 
the student is not advised to weary his intellect 
with trying to make such a picture. Bather he 
should manipulate his four or more variables 
according to the rules of algebra, or he should 
reason by analogy with the geometry of two or 
three dimensions. In other words, when an n-di- 
mensional proposition' is presented do not become 
panic-stricken and consign the treatment of it 
to mortals cast in a higher mold and endowed with 
a pineal eye. 

statistic: a statistic is any quantitative or 

qualitative characteri zation of a thing observed, 
or any summarized statement of such. John Doe, 
aged 12, height 59 inches , is one of a group of 
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70 whose wean age is 13.0 , average height 63.3 
inches , in spite of one member being hut 48 
inches tall . Every item in italics, as well as 
an innumerable number of others not mentioned, 
is a statistic. Statistics is the plural of 
statistic, as defined, and also the entire body 
of sound practice which has been developed for 
summarizing observations and discovering invari- 
ant features of them. These summarized state- 
ments, such as mean age 13.0 , do not "prove” 
points. They merely describe a situation. The 
expression "statistics prove" is always a mis- 
statement. Statistics are their best "measure,” 
permitting him who will to "infer." We correctly 
say that a yardstick "measures" the height of a 
child, not that it "proves" it. 

symbols: it is thought that a mere mention 
of most of the symbols to be employed in this 
text should suffice, so that explanations of 
only a few follow: 


- equality 
= identity 

“• or t approximate equality 
^ equivalence, see page 34 


+ plus 


~ minus 


x, 


, : division, thus aib,~S-, a/b, a:h 

h 

*, or nothing at all, meaning multiplica- 
tion, thus, a X b, or at* b, or ab 

* 


°° °„ r ^ ^, n lnfinite ly large magnitude, or a 
true" magnitude, i.e. , one having no chance 
error in it. 


exponent , to indicate a power, thus 


a.a.a«a*a 
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r~ square root, / indicates cube root, etc. 
fractional exponent , to indicate root, thus 


a 5 = 

stibscr ipt , to specify the one of several of 
a sort, thus x x , x 2 , ordinarily stand for 
score or measure on a first variable and 
on a second. x 1 a , x lb , etc., to indicate 

score of subject a on the first variable, 
of subject b on the first variable, etc. 

2, Greek capital sigma, to indicate the sum 
of all the measures in question, thus ^x 1 
equals the sum of all the x 1 measures for 
the group in question, equals 

X la + *lb f *m 

if there are n measures. 

... to indicate omitted item of a series, as 
shown just above. 

) ( to indicate excluded term of a series, thus, 


l 12.3 


) 7 ( * • • 9 


= r 


12.315689 


II Greek letter pi, used as a symbol of opera- 
tion indicating the product of all the 
measures in question, thus, 


n x i - *la 


X, 


v lb lc 1 n 

if there are n measures. 

e = 2.71828183, the Naperian base of loga- 
rithms, which is a magnitude entering into 
many formulas. 

N indicates the number of measures in a series. 
When used as a subscript x la , x lb , ••• x ln , 

n may be used in lieu of N because its 
size makes it better adapted for use as a 
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subscript. 

transformation : the substitution of a second 
variable for a first of which it is a function, 
generally a linear function. Thus, if relation- 
ships involving x are given, and in lieu thereof 
relationships involving y are determined, where 
x = a + by, a linear transformation has been made. 
See invariant . 

transmutation : used synonymously with trans- 
formation , which is the mathematical and approved 
term. No alchemical implications are involved. 

variable : see literal notation . 

SECTION 5* PLAN OF STUDY 

If the reader has looked upon statistical 
method as a tool to be resorted to when necessity 
dictates, it probably holds a very lowly status 
in his mind. In one sense this is correct, for 
statistics is definitely a servant to the socio- 
logical, biological, and politeal sciences, but 
a very independent servant in his methods, a 
servant who cannot be ordered to "prove a point, n 
but only to investigate it; a servant who re- 
peatedly tells his master, the setter of the 
problem, that he is on the wrong track, that the 
problem is incapable of solution with the data 
at hand, or that refinements of procedure called 
for if a solution is to be obtained are far beyond 
those anticipated. 

As an adviser of graduate students it has been 
the writer's experience to have "worthwhile" 
issues presented, and time and again with no, 
little, or inadequate knowledge of what is implied 
in reaching a solution. As one of many examples 
he could cite the case of the student who wished 
to prepare himself to make analytical diagnoses 
of abilities of pupils and to give them sound 
educational and vocational guidance, but who had 
no slightest idea of studying or utilizing partial 
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or multiple correlation. To such a student sta- 
tistics asserts its independence, demands tech- 
niques, defines the limits of undertakings as 
dependent upon data and treatment, and finally 
pontificates in the matter of reliability of 
diagnoses. Statistics is the servant in the 
house, but, whether known or unknown to the mas- 
ter, is guided by rules of conduct so full of 
virtue that they may not be overstepped. 

The student who really attains a frame of mind 
that is sympathetic to statistics must find 
satisfaction in neatness, dispatch, soundness, 
and generality of procedure, not satisfaction 
only in the proof of a cherished belief. In fact, 
the first-mentioned satisfactions should be so 
great that a disproof of a belief carries but a 
minor sting. 

The usual elementary text in statistics seeks 
to acquaint students with the techniques neces- 
sary to obtain a number of highly valuable summa- 
rising statistics: averages, measures of varia- 
bility, a few measures of correlation, and mastery 
of some of the elementary graphic devices. This 
text considers these things as a secondary aim, 
the primary aim being to acquaint the student 
with the logic back of these procedures. It is 
believed to be more important for a student to 
understand why a mean is employed than to know 
the steps for calculating it; why and when a 
product-moment correlation coefficient is signifi- 
cant than to master the intricacies of the xlnt 
correlation chart. This broader purpose necessi- 
tates a different and a more extended treatment 
of so-called elementary issues than is usual. It 
is accordingly here attempted to cover in the 
earlier chapters a smaller number of different 
statistical concepts than are usually covered in 
an elementary book, but to do so with greater 
than usual logical completeness. 

In that this text is designed to serve as an 
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introduction to advanced study, the notation used 
is that appropriate to advanced statistics. For 
example, it would be quite possible to write an 
elementary statistics without recourse to letters 
in the Greek alphabet. This, however, is not 
here attempted, for Greek letters will not only 
serve in the elementary problem but in addition 
will provide familiarity with concepts essential 
in more advanced statistics. Special care has 
been taken to provide the simplest notation 
congruent with prospective needs of the student. 
This occasionally, but only occasionally, leads 
to a notation which is a trifle more elaborate 
than the immediate problem demands. 

PROBLEMS 

Problem 1. 

As a vocabulary review the student may test 
his understanding of the following words and 
phrases. 


abscissa 

distance 

absolute value 

equality 

arbitrary origin 

equivalent 

argument 

evaluate 

axis 

factor 

biometry 

finite 

central tendency 

formulation 

consequent 

function 

constant 

identical 

coordinate 

identity 

coordinates 

imaginary 

correlation 

increment 

crude score 

indeterminate 

curve 

interpolation 

degree 

invariable 

determinate 

invariant 

diameter 

line 

dimension 

mathematics 
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maximum 
minimum 
monotonic 
multiple valued 
function 
negative 
ordinate 
origin 

origin, arbitrary 
polar coordinates 


positive 
real number 
reduce 
score 

single value 
function 
solution 
space 
statistic 
sum 


Problem 2. 


With the general principles covering the propa- 
gation of error in mind, place a star over the 
appropriate figure in the right-hand members of 
the following equations to indicate the most 
probable location of the errors in the answer 
consequent to errors as designated by stars in 
the terms of the left-hand members. 


* ^ * * 


123.5 + 11.62 = 135.12 

11. 108.6 t 3.0 = 36.2 

154.4-2 + .014 = 154.434 

12. .0747 f . 1245 = .6000 

95.75 - 13.4 = 82.35 

13. 192 t 1.6 = 120.0 

67.805 - 13.60 = 54.205 

14. 273 t .007 = 39,000 

16.10 + 5.300 = 21.400 

15. (2.7) 2 =7.29 

103 X .305 = 31.415 

16. (I.5) 3 = 3.375 

41.5 X .014 = .5810 

17. (.0012) 2 = .00000144 

.07 X 8.6 = .602 

18. / 225 = 15.00 

20 X 15.0 = 300.0 

19. / 3.6* = 1.9000 

. 007 X .44 = .00308 

20. /. 000529 = .02300 


These exercises have been given because of 
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the common error of recording far more figures 
than are significant. 

The following problems in plotting mathemati- 
cal equations range from very simple to diffi- 
cult, though these latter (Problems 7 and 8) 
involve nothing beyond the handlingof logarithms 
and negative exponents. 

Problem 3. Plot the straight lines 

(a) y = 4 + 3x 

(b) 2y - 4 + 3x 

Relate the slope of each line as shown graphi- 
cally with the coefficients of x and y. 

Problem 4. Plot the second degree parabola 
y ~ 1 + 3x 2 + v 3 , which may be written 
y~5 - (x-l)(x + 2) 2 

Relate the graph to this second way of writing 
the equation. 

Problem 5. In Table XV C the proportion of 
cases in a unit normal distribution lying to the 
left of the corresponding x-value is p. The 
equation relating p and x is 

* 1 . x 2 /2 

P “ J z ofx, in which z = /-?> — e 

— oo * 277 

Since the related values of p, z, and x are given 
in Table XV C no computations are necessary in 
making the following graphs (a) z as a function 
of x, and (h) p as a function of x. (a) is the 
ighly important normal distribution and (b) is 
the graph of its integral. 

Problem 7. The simple logistic is an im- 
portant growth curve. Its equation in the general 
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form is 


y 


k 

1 + e a+bx 


or y 


k 

1 + Ae bx 


Dr. Marian Wilder has determined the constants 
k, a, and b, for the logistic which fits. as 
closely as possible (in the least squares sense) 
to the integral of the unit normal distribution. 
The equation is 

1 

y " 1 + e- 1 - 6637 * 

Plot this and compare with (h) of Problem 6. 


Problem 8. The Gompertz curve 


y 


- b 


is also a growth curve. Dr. Marian Wilder has 
determined the constants a and b for the Gompertz 
curve, which fits as closely as possible (in the 
least squares sense) to the integral of the unit 
normal distribution. The equation is 

y = 1 . 9 q19~ 3 ' 1500 


Plot thi s and compare with (b) of Problem 6. 



CHAPTER II 

STATISTICAL SERIES 


SECTION l. TYPES OF DATA 

Logically we may classify series as follows: 

I. Temporal: (a) qualitative or ( h ) quantita- 
tive differences, shown as time changes. 

II. Spatial or Geographic: (a) qualitative or 
(h) quantitative differences shown as 
location in space changes. 

III. Momentary (or Pseudo-momentary): (a) quali- 
tative or (h) quantitive differences 
shown as the items in a sample change. 

Hnder pseudo-momentary fall the large class of 
observations which it is assumed have not been 
altered by the particular moment or place of 
observation. For example, frequently the records 
of a week ago, a year ago, or of today, of a 
chemist may all be thrown into a common distri- 
bution, believing that the moment or place of 
observation has been immaterial to the issue at 
hand. 

The following naming of series has been here 
adopted : 

I. (a) Qualitative- temporal series. 

(h) Temporal series. 
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II. (a) Qualitative-spatial, or qualitative- 
geographical series. 

(b) Spatial, or, if related to location 
upon the earth, geographical series. 

III. (a) Qualitative series. 

(b) Quantitative series. 

Let us consider these in more detail. 

Temporal series: If a single thing, say a 
child, is observed at successive periods of time 
with reference to a single character (the term 
preferred by biologists), or trait, or character- 
istic, a time series results. Usually the suc- 
cessive measures are quantitative, as would be 
the case if height is measured, but they need not 
be. Thus if the successive observations of an 
infant’s vocalization give the syllables blah, 
ma, bab, omp, these constitute a qualitative time 
series, as the differences in the trait, as time 
varies, has been the issue. 

Table II A is another illustration of a quali- 
tative-temporal series. As is generally the case, 
it is here obvious that fuller information would 
disclose quantitative phenomena underlying most 
of the qualitative statements listed. 

TABLE II A 

IMPORTANT STEPS 
IN THE AUTOMOBILE INDUSTRY* 

1900 "Mass production" of motor cars first employed 

1901 First American speedometer made 

1902 Alloy steels used in making automobiles 

1904 Head lamps included as standard equipment 

1906 Front bumpers and electric horns pioneered 

1908 Left-hand drive popular 

1909 First closed bodies built 

* AUTOMOBILE FACTS, II, No. 2, October, 1939. 
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TABLE II A 

(CONTINUED) 

1911 Electric starting, lighting, and ignition first 
combined in a single system with storage battery 

1912 Windshields made as part of body by one company 
1915 Top and windshield became standard equipment 
1915 Steel frame body introduced 

1917 Disc wheels appear 

1918 Tire sizes standardized 

1919 High-pressure chassis lubrication adopted 

1920 Windshield wiper universal 

1921 Hydraulic brakes on some cars 

1922 Balloon tires appear 

1924- Bumpers became standard equipment 

1925 Four-wheel hydraulic brakes and all-steel 
bodies used 

1926 Rubber engine mountings, rubber spring shackles, 
and adjustable front seats introduced 

1927 Gearshift standard on all cars 

1928 Safety glass introduced as standard equipment 

1929 Automobile radio introduced 

1931 Vacuum spark control used 

1932 Drop-center rims replaced "detachable” 

1933 First steel top used on some cars 

1934 Synchronized front and rear springs introduced 

1935 All steel bodies universal. Steering-post 
gearshift on one make 

1937 Built-in windshield defroster ducts 

1938 Improved windshield vision, independently 
sprung front wheels attract attention 

1939 Steering-column gearshift becomes popular 

1940 Sealed-beam headlights universally approved 

If one child at age 18 months is measured for 
some trait, a second child at age 24, and a third 
at age 39 months, etc., there results a series 
(a pseudo-time series) which is not a true time 
series, for no single principle of sampling has 
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been employed, that is, the successive items 
represent differences with reference to (a) time 
and (b) subject. f * v hen such series are treated as 
time series it is upon the assumption that differ- 
ences in ( h) are immaterial. This, is generally 
a hazardous assumption and should call for spe- 
cific study before being made. Most achievement 
and intelligence test norms which are intended to 
be time series or growth curves in the function 
in question involve this assumption, for it has 
seldom been attempted to base norms upon the same 
subjects as they age from year to year. It is 
commonly, for example, assumed that the thirteen- 
year-olds fairly represent what the twelve-year- 
olds will become in a year’s time. Expediency is 
the excuse for this method of approximating true 
time and true growth curves. 

If the trait measured at successive periods of 
time is a developing one, it may be called a 
growth series, and, when plotted, a growth curve. 
A growth curve generally starts at a point near 
zero. It is not to be expected that the exact 
starting point can be caught; even if the start 
in the case of a living bisexual organism is 
called the moment when the sperm impregnates the 
ovum, still the point is arbitrary, for both 
sperm and ovum have had antecedent histories. 
Starting with an initial measure, near zero, the 
typical growth curve increases, with first in- 
creasing and later decreasing acceleration, to 
some physiological maximum at a date which may be 
fully as difficult to determine as the zero point 
date. 

In contrast to this typical growth curve is 
the time series characterizing a much later stage. 
An economist may not know the time and correlative 
price of the first bushel of wheat. The starting 
point is lost in the remote past, but the fluct- 
uation of the price of wheat from day to day and 
month to month, as time is now changing, yields 
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a very typical time series, but it is no longer 
called a growth series. 

Typical issues arising in connection with time 
series are not the same as for other series. In 
Chapter V it is pointed out that the essential 
issues, the essential technicmes, and the es- 
sential summarizing statistics, all change as 
type of series changes. 

As a second type of series mav be mentioned 
the geographic series, or, to use the more com- 
prehensive term, the spatial series. In this 
case differences in time are either (a) avoided 
by collecting all items at the same time, or (b) 
considered irrelevant, but locations from which 
the data are collected are of prime importance. 
If the material collected has been distributed in 
three-dimensional space, the very general spatial 
series results. An illustration of such a series 
would be a survey of the dust content of the air 
in the different city precincts and at different 
elevations above street level. Problems involving 
data of this three-dimensional sort are becoming 
more common because of air and submarine trans- 
port. A not too large portion of the surface of 
the globe may be considered a plane, so that the 
ordinary spatial series involves differences in 
locus of items in two dimensions onlv. The series 
is then called a geographical series. Each term 
collected is uniquely and intrinsically associated 
with the point of origin. If this fact is lost 
in any step of procedure, the series immediate] v 
ceases to be a geographical series. If in a plot 
of ground laid off like a checkerboard, one unit 
of area reveals 2 cut worms, a second unit 23 
ants, a third 1 glow worm, etc., these successive 
qualitatively different items attaching to the 
differences in location constitute a geographical 
series. The more ordinary sort is found when the 
differences revealed are quantitative, as would 
be the, case if one unit of area, perhaps an entire 
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township, yielded 10, 000 bushels of wheat , a second 
zero bushels, a third 5,000 bushels, etc. The 
essential statistics of geographical series are 
even more specialized than are those of a time 
series. 

The map is essential in portraying spatial 
series. Many spatial series show both qualitative 
and quantitative differences, in which case a con- 
siderable ingenuity is needed to devise a map with 
cross sectioning, or color scheme, to portray the 
essential facts. Spatial series are intrinsically 
more amenable to graphic treatment, and less to 
numerical treatment, than temporal or quanti- 
tative series. The maps of the U. S. Coast and 
Geodetic Survey, of the Weather Bureau, and of the 
Census Bureau show the high degree of completeness, 
variety, and detail of portrayal that is possible 
The groupings of territories in spatial series and 
the subdivision of areas may follow conventional 
procedure or the peculiar needs of the problem. 
The order adopted by the Census Bureau in giving 
population statistics is shown in Chapter IV, 
Table S. 

The third and fourth types of series are the 
qualitative and the quantitative which exist when 
the principle of sampling has either allowed for 
time and space by making them constant, or when it 
is considered that such differences in time and 
space, at which the successive observations have 
been made, are immaterial. Thus some important 
consideration other than, or in addition to, time 
or space has been utilized as the principle in 
getting the cases of the sample, and of course 
some definite principle has been followed in making 
the observations or taking the measurements of the 
case. If the resulting characterizations of the 
observations taken differ qualitatively one from 
another, the series is a qualitative or categorical 
series, whereas if they differ in terms of amount 
it is a quantitative series. Observations upon 
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eye color, or hair color, or blood type, or sex, 
or abnormality of fifth-grade school children, 
would yield qualitative series, whereas recording 
the height, or weight, or age, or psychological 
test score, or achievement test score, etc., would 
yield quantitative series. One immediately deduci- 
ble property of the items of a quantitative or 
qualitative series is that they are not tied to 
specific time and place of origin, that is, these 
are neglected when studying them. 

Thus in every instance and for every sort of 
series the observations taken cover but a fraction 
of reality. If numbers of bushels of wheat are 
the elementary statistics, the animal life in the 
unit of area is not. If heights of twelve-year- 
old boys are the elementary statistics, their 
places of birth and dates of birth, except as 
such might define the principle of sampling 
twelve-year-olds, are not. And so it goes. We 
are continually studying aspects of total situ- 
ations, and it must be so, whether the method of 
study is admittedly statistical or not. 

When a research undertaking is planned , it 
will he found conducive to later precise treat- 
ment if an attempt is made to relate the issues 
to be studied to some one of the four types of 
series. If this is impossible, the statistical 
handling of the resulting data becomes complex, 
though not necessarily insoluble. 

The classification here followed of simple 
series as being temporal, geographical (or, more 
generally, spatial), quantitative, and qualita- 
tive, is not identical with that made by certain 
statisticians: Yule (1929) divides statistics 
into the statistics of attributes and the sta- 
tistics of variables. These correspond to quali- 
tative and quantitative, as herein given. Jerome 
(1924) divides series into historical and cross- 
section, and the latter he divides into geo- 
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graphical, qualitative, and quantitative, thus 
agreeing with the classification herein used. 
Another common classification is into discrete 
and continuous series. This is made by Garrett 
(1926), Thurstone (1925), Rietz (1924) , and Young 
(1925). R. A. Fisher (1934, page 25) classifies 
certain series as temporal and again he classifies 
them (page 4) as continuous and discrete. It 
should be noticed that quantitative and qualita- 
tive are not synonymous with continuous and dis- 
crete. The initial data in quantitative series 
may be continuous, as, for example, heights of 
twelve-year-olds, or discrete, as, for example, 
the numbers of children in families; but the 
initial data in qualitative series can only he 
discrete. The merit of any classification is in 
its utility. In general, the reader will find 
that a definite set of statistical techniques 
follows from the type of series involved. This 
is true for the four types here employed. If the 
quantitative is further subdivided into continuous 
and discrete, it will be found that there are 
separate statistical techniques which are in har- 
mony with each. 

section 2. TYPES OF ISSUES 

We have considered the types of data occurring 
in the different series. Let us now inquire as 
to the differences in issues consequent to differ- 
ences in series types. 

In time series 6ne is concerned with fluctu- 
ation of some single function with changes in 
time. If the function is greater at one time 
than another some measure of this excess is im- 
portant. If a natural zero point, or upper limit, 
such as a 100 per cent point, is not available, 
then some gross measure of di f ference must be 
employed, but in the very commdn case where one 
or both of these natural limits are known (limits 
in terms of the amount of the magnitude, not 
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limits in terms of the time at which the zero 
amount or 100 per cent amount is attained), the 
obvious measure of relationship is the ratio of 
the magnitude of the function at one time of its 
magnitude at another, and since the zero magni- 
tude at the beginning of growth is never an ob- 
served value, one does not secure infinite ratios* 
Should one secure an infinite ratio because the 
magnitude at some time after the start becomes 
zero, then the ratio as a statistic loses its 
value. In general , the ratio is a statistic 
peculiarly adapted to the study of temporal 
series . 

The price of a commodity at a given date, 
divided by the price at a second date, is such a 
ratio. If a second commodity for the same two 
dates is involved, a second price ratio is ob- 
tained. Composites and aggregates of such ratios 
or relatives are called (price) indexes, and 
these are essential tools for the study of time 
series. 

If the price in 1934 is expressed relative to 
the price in 1933, the year 1933 is called the 
basal year. The choice of such basal moments of 
time with reference to the magnitude at which 
time all other magnitudes are expressed is es- 
sentially a problem of time series. 

There is frequently evidence of a systematic 
change in magnitude over a considerable period of 
time. This is called a trend, and a line indi- 
cating it when the series is plotted is a trend 
line. The trend is an essential statistic of 
many a time series. 

There may be a regular order of fluctuation 
as time changes, yielding cyclical phenomena, and 
a study of these and their periods are peculiar 
to time series. Any time series of appreciable 
duration (in studying etheric vibration .001 of 
a second would be a very appreciable duration) 
may be expected to show periodic fluctuations. 
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As a consequence one of two procedures is neces- 
sary, dependent upon whether it is desired (a) to 
s tudy the changes within a certain cyclical 
period , or (b) to study trends independent of 
such periodic changes. Illustrations will make 
the problem clear: 

(a) l et it be required to ascertain the nature 
of the load of an electric-power generating plant 
during a twenty-four-hour period. The current 
consumed per hour for some one day could be tabu- 
lated or plotted. The resul t would have only 
such accuracy as would result from a single day’s 
sampling. To obtain a more reliable picture, a 
number of days could be combined and the tabu- 
lation made showing the average load for each 
hour of the 24.. Obviously error might creep in 
here , for the load on a Monday would be quite 
different from that on a Saturday or Sunday, and 
perhaps different from that on the other days of 
the week. With due allowance for holidays, proba- 
bly a very satisfactory idea of the hourly fluctu- 
ations of the Monday load could be obtained by 
pooling results for several Mondays . Differences 
in daylight, temperature, etc . , would make it 
unsoubd to combine all the Mondays in the year. 
The problem cited is typical of temporal series 
problems, and the principle that should guide 
one in pooling results should be to group as 
wide a range of data as are typical with respect 
to the characteristic under investigation , but 
not affected by o ther seasonal or systematic 
tendencies . 

(h) I et it be required to ascertain the nature 
of the seasonal fluctuations of the load . In 
this case a tabulation by weekly units would be 
the best, as this would completely suppress both 
Saturday and Sunday and hourly idiosyncrasies. 
With this in mind, it is seen that a tabulation 
by six or eight-day or monthly periods would not 
be as satisfactory as weekly or bi-weekly periods. 
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The principle to follow is to use stich a temporal 
unit as equals or is an integral multiple of the 
period within which occur the tendencies which 
it is desired to suppress . 

Without claiming that ratios, periods, trends, 
etc., do not occur, or are unimportant, in other 
than temporal series, still it should be noted 
how peculiarly intimate is the statistic of im- 
portance and the series type. This intimacy is 
apparent whether graphic or algebraic treatment 
is involved. 

The geographic series is much less amenable 
to algebraic treatment than the temporal , because 
the issue is connected with a place, not with a 
date which may be represented by a one-dimensional 
time continuum. An equally simple and well-known 
algeb raic representation for geographical po- 
sition is not available. The tightness with 
which the issue is connected with geographic 
position is the crux of the geographic series. 

A lessening of the bond should never be attempted. 
Accordingly, the two-dimensional map becomes the 
basic instrument in the treatment of geographical 
senes data. The richness and accuracy with 
which data may be presented upon a map is nicely 
shown by the maps of the Coast and Geodetic 
ourvey. No parallel neatness in algebraic treat- 
ment is available, but one statistic, the center 
of population based upon a two-dimensional 
is ribution of cases, has been worked out alge- 
braically thus becoming a statistic which is 
unique to the study of geographical series 
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sentially connected with moments of time or po- 
sitions in space, for, though noted at a specific 
time and place, they maintain after the subjects 
have gone to a six o’clock dinner. 

Furthermore, there is no obvious quantitative 
relationship between nationalities. What has one 
learned by such a tally? The numbers in each 
category (a term used only when rubrics are dis- 
crete) or class (a term used when rubrics are 
either discrete or continuous). The numbers or 
the relative numbers in each class and the re- 
lationship of these from class to class are the 
only issues involved. Such a concept as the mean 
nationality does not enter in. What is the mean 
nationality of three Germans, four Irishmen, two 
Frenchmen, one Japanese, and seven Americans. 
The essence of qualitative data is that the items 
in it are not di f ferences in amount on a single 
continuum. 

To the biologist the concept of gene or the 
allelomorph is essentially that of the qualitative 
series antecedent to the mapping of the chromo- 
some, after which these become concepts of a one- 
dimensional spatial series. This nicely illus- 
trates a common characteristic of qualitative 
series. The items frequently are qualitative 
because of inadequacy of one’s knowledge as to 
spatial or quantitative relationships between 
classes. One of the most fruitful lines of at- 
tack of qualitative data is the attempt to dis- 
cover a sense in which it is not qualitative . 
However, if no such discovery is made, there 
still remains the possibility of study of the 
series through the statistics essential to this 
type, namely, frequencies in a class, proportions 
in a class, and ratios of such proportions. 

Lastly we have q uan t i tat i ve series, always 
presenting two dimensions capable of quantitative 
study: (a) di f ferent amounts of some single 
thing, and (h ) frequencies of occurrence of these 
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different amounts. In the study of such series 
we obviously have the possibility of all the 
issues connected with qualitative series, for, 
neglecting the quantitative relationship of the 
variable in question , we have data recorded 
in classes so that all qualitative series tech- 
niques armly without any modi ficat ion whatsoever . 
The beauty of the quantitative series lies in 
that a much greater variety of problems presents 
itself, a much richer, more exacting, and a more 
descriptive set of statistics are available for 
the analysis of relationships inherent in the 
data. A typical quantitative series would result 
if the scores of a certain class on a certain 
achievement test are gotten. A record of each 
score, together with the number of pupils re- 
ceiving it, constitutes the basic series. 

Life’s problems do not confine themselves to 
single series, and certain methods have been 
developed for handling problems which are com- 
plexes of two or more of the four types mentioned, 
but it is well to recognize that in eeneral the 
problem and the method are functions of a sinele 
series. 

section 3. TYPES OF STATISTICAL PROCESSES 

The various processes of statistics may be 
listed in the order in which thev take place: 
(a) definition and analysis of purpose, ( h) col- 
lection of raw data, ( c ) tabulation of raw data, 
(d) computational procedures, (e) presentation 
of results through the employment of summarizing 
statistics or graphic devices. 

(a) Def i n i t i on ' and analysis of purpose: Ante- 
cedent to the initial collection of facts is the 
total life experience of the investigator in the 
light of which he has defined his problem with 
such detail that the hypothesis upon which he is 
building will be supported if the data yield 
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certain statistics, — or, more rarely, but gener- 
ally with greater scientific insight, he has 
multiple hypotheses, each in turn associated with 
different or alternative statistical outcomes. 

(b) The collection of data: The greatest care 
must be exercised to insure that the sample dealt 
with is " fair. " It, of course, should be chosen 
because of some principle of sampling in mind, 
and trhe care should be that this, and not some 
unthought -of basis, is actually operating. This 
question may always be reduced to that of the 
comparability of the sample chosen, as observed, 
measured, or tested, with samples to be selected 
and measured in the future, or with samples that 
have been selected and measured in the past. The 
word " fair" implies a comparison. Thus , not one 
sample but at least two, and generally more, 
should be in mind, — on the one hand the immediate 
sample which the experimenter is about to use, 
and on the other hand the sample or samples with 
which comparison is to be made. Between a sample 
and its counter-sample, or contrast-sample, or 
over- and- against- sample, all conditions should be 
as nearly identical as possible except the one 
whose influence is being sought, that is, except 
the one under investigation. The existence of 
this counter-sample is inherent in all problems, 
though not specifically stated. Suppose one’s 
expressed purpose is to find the mean 16-year-old 
level of accomplishment upon a certain psychologi- 
cal test. Assuredly the value in doing this would 
be because it enables comparison with other groups 
or individuals. If this counter-group is of 15- 
year-olds, then they should differ from the first 
group in that respect only, and not in such sun- 
dry other respects as sex, nationality, etc. The 
burden of insuring similarity between these two 
groups should not be entirely or mainly upon the 
shoulders of the person collecting the second 
group. It is essential at the time of the col- 
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lection of the first sample that the nature of 
subsequent contrasting samples be in mind, and 
the more definitely the better. It is incumbent 
upon the person collecting the first sample to 
define it completely and fully. Failure to do so 
is irremediable, for no care exercised by the 
collector of a subsequent sample can allow for 
undefined conditions in the first sample. 

A different type of ability is called for to 
insure the proper collection of data from that 
necessary to its subsequent evaluation . For 
example, in, securing scores upon a psychological 
or achievement test the collector should be 
sensitive to individual differences of his sub- 
j ects which are abnormal , or more accurately, 
other than those assumed to hold. He sh6uld be 
sensitive to any unusual conditions under which 
the test is being administered, such as conditions 
of light or state of discipline, and he should 
be sensitive to temporary attitudes or conditions 
of the subjects which are unusual . None of these 
types of sensitivity are called into play in the 
computational work which follows. It should not 
be assumed that because a person is a good sta- 
tistical computer he is also a good collector of 
data. 

(c) The tabulation of data is usually a tire- 
some and routine undertaking. There is, in the 
first place, the assembling of the collected 
quantitative and qualitative items. Though the 
items have certain definitive individual charac- 
teristics, such, for example, as the name of the 
subject measured, the specific time and place 
gotten, these are usually considered irrevelant 
to the issue, and are not retained in tabulation. 
Just how much should be retained and how much 
discarded should not be decided upon the basis 
of the immediate needs of the investigator, but 
rather in the light of presumable or probable 
interests and needs of future investigators. 



TYPES OP STATISTICAL PROCESSES 77 

Investigators generally err by publishing too 
little original data. Examples of such failures 
are frequently present when sex of subject, or 
nationality, or even age, is not tabulated and 
made available to the reader, though each of 
these items is a part of the original record. 

There should be an initial tabulation of raw 
data which includes, in addition to the items 
which will be used by the investigator, such 
other items as are available and as may be perti- 
nent to the problem of workers in related fields. 
Then follow intermediate tabulation^ choosing 
and organizing data for the purposes of the in- 
vestigator. And, finally, there are tabulations 
of the findings, — terminal statistics, — of the 
study. Questions connected with these various 
tabulations are treated at greater length in 
Chapter III. 

(d) The statistical processes involved in the 
evaluation of data range from such simple opera- 
tions as making frequency tables and plotting 
distributions, time curves, etc., to the involved 
algebraic undertaking of deriving appropriate 
formulas and standard errors for the particular 
statistics being dealt with. Most of this text 
is concerned with the issues connected with the 
algebraic treatment and evaluation of data. 

(e) The presentation of results: No matter 
whether the magnitude shown is represented by a 
distance (graphic portrayal), a numerical amount 
(quantitative series), or represents a frequency 
in a class (qualitative series), there is always 
(1) some stated or imvlied counter -magnitude to 
which it is compared , and (2) some degree of 
credibility of the difference between the magni- 
tude and its counter ^magnitude is reached « The 
most precise way in which this credibility is 
stated is in terms of the probability that a 
difference as great as that observed could have 
arisen as a matter of chance. Concepts ft) and 


78 


STATISTICAL SERIES 


(2) are implicit, when not explicit, in every 
statistical evaluation. For example, to say that 
a child’s mark in reading is 74 becomes meaning- 
ful only when some such added concepts are present 
as, first, the child is a fifth-grade pupil in a 
school wherein the passing mark is 70. There is 
also needed some idea of the variability of 
fifth-grade pupils in their ability; and, finally, 
the idea that the 74 is not a perfect represent- 
ation of the ability of the child in question. 
To precise concepts of the sort mentioned should 
adhere a further precise concept . —that giving 
the probability that 74 is sufficiently above 70 
that chance does not account for the excess. The 
method of reaching a precise statement of proba- 
bility cannot be given at this point, but it 
should be obvious that some such concept, precise 
or indefinite, is inherent in the interpretation 
of the trustworthiness of, and therefore the 
significance of, every statistical difference. 
The reader should develop the habit of appreci- 
ating that an answer in terms of probability 
exists for every observed difference between two 
statistical measures, even though he does not 
have the necessary ability, or perhaps data, to 
arrive at a precise statement of it. Ju-st as the 
error in statistical statements is ubiquitous, 
so should be the awareness by the student of this 
fact. 

If the differences are great, as, for example, 
would be the case if the height of a normal adult 
man is compared with that of a small child, and 
if the fallibility of the measures is small as 
would be the case if the height of each is gotten 
by precise methods, then the chance that the man 
is not truly taller than the child but merely 
seems so, due to the inaccuracy of observation, 
is of the order, perhaps, of one i n a million, 
rhe first concept is that of the difference, and 
the second that of its trust-worthiness. Here 
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the second assumes a minor role, but the reader 
should note that even here it is present. Further, 
it is important to note that differences of this 
obvious sort are such as society has long recog^ 
nized and become adjusted to, and they thus pos- 
sess no particular interest. The interesting 
issues quite invariably attach to differences of 
such amount that a certain element of uncertainty 
as to trustworthiness persists. The crude tech- 
niques of the remote past have established such 
facts as that adults are taller than children, 
but it has remained for refined procedures, care- 
fully expressing differences in terms of the 
chance that they may have arisen as a matter of 
chance, to indicate that, for example, the visual 
imagery of children is more variant than that of 
adults. The former issue, — adults taller than 
children, — is so well established that it has 
ceased to be a problem, while the latter is still 
a matter of interest, and thinking about it in- 
volves both the concept of a difference and the 
certainty or trustworthiness with which it is 
established. These joint concepts are the two 
terminal points of statistical procedure . 

SECTION THE SERVICE RENDERED BY ELEMENTARY 
AND BY ADVANCED STATISTICS 

The evaluation of data may involve the range 
of statistical processes from the most elementary 
to the most abstruse. There is on the one hand 
a resort to statistics to make specific and 
quantitative a concept such as an average, and 
on the other hand to elucidate a relationship 
which the investigator no more recognizes as 
statistical than he does concepts of color, musi- 
cal harmony, or beauty of form. On the other 
hand, concepts may be developed de novo from the 
data at hand and from a consideration of mathe- 
matical relationships which were not in the mind 
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of the investigator at the initial stage of his 
study. The making specific and the testing out 
of these concepts represent a functioning of sta- 
tistics in its more powerful aspects. The ele- 
mentary student may well devote his effort to 
finding appropriate mathematical or statistical 
expressions for concepts or issues which have 
been suggested to him from nonstatistical sources. 
This calls for ingenuity and leads to a develop- 
ment in thinking. As greater mastery and confi- 
dence in adapting statistical techniques to 
questions in mind develop, the student may find 
a reverse process occasionally takes place, and 
that issues are suggested by mathematical and 
statistical relationships discovered, and that 
these issues demand not only careful logical 
analysis, but quite frequently the collection and 
evaluation of new data. When this is the case, 
statistics is no longer a tool but also a source 
of inspiration. Derivations of fundamental formu- 
las are the essential elements in training in this 
latter stage, but they will play a minor role in 
a first course. It is to be hoped that the reader 
will be annoyed by the paucity of derivations in 
this text, for that is a sign that the subject is 
assuming a place in his processes of mental 
analysis and is not merely a set of mechanical 
rules of operation. 

The purpose of classification of series into 
types may not be apparent to the student at this 
stage of his study. As the study of graphic por- 
trayal, averages , measures of variability, fre- 
quencies and proportions in classes, etc., pro- 
ceeds, it will be found that there is a chain of 
dependent techniques and of terminal statistics , 
graphic or arithmetic, which are consequent to 
the type of series dealt with . Thus, as soon as 
the series is classified as of a type, the issues 
which are presumably of importance and the techni- 
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ques for working them up and presenting them are 
suggested. 

It is true that much data may be classified 
in more than one way. For example, the achieve- 
ment scores of pupils of different ages in differ- 
ent schools of a city may be classified as a 
quantitative series, a geographic series, or a 
temporal series. If the first, one variable is 
score and the other is frequency (age and school 
being neglected); if the second, one variable is 
school location as given on a map, and the other, 
achievement score — or the other might be number 
of pupils of a given age range or score range; 
if the third, the assumption is made that those 
of a given age are the same sort of individuals 
so far as achievement is concerned as those one 
year younger will become a year hence. Due to 
this assumption, this should be called a pseudo- 
temporal series, but the issues of interest, 
mental growth, etc., will be the same as those in 
a true temporal series. These three ways of 
looking at this simple body of data do not exhaust 
the points of view that may be taken. The student 
should realize that the series type into which 
the data fall is nearly as much a question of 
point of view from which viewed as it is a 
question of the intrinsic structure of the data . 
When data may be viewed as of more than one type, 
it frequently occurs that one or more of these 
ways of looking at it are more or less trivial, 
raising and leading to the analysis of no im- 
portant questions. 

A more detailed classification than into one 
of four is useful. The following has been made 
because in each instance one or more unique 
characteristics of treatment, or of evaluation, 
are connected with the type in question. 

Temporal 

T-t - temporal series in which increases 
and/or decreases, not necessarily 
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monotonic, are present. Include here 
T-t-r and T-t-i, temporal series 
involving ratios and indexes. 

T-c r temporal series in which periodic or 
cyclical phenomena are present. 

T-g = temporal series in which growth (mono- 
tonic) is present. 

Spatial 

G-c ~ geographic series wherein there is 
continuous variation in the variable 
from one geographic location to a 
neighboring location. 

G-d “ geographic series in which there is 
discontinuous variation m the vari- 
able from one geographic location to 
a neighboring location. 

S = spatial series or one based upon lo- 
cation in three-dimensional space. 

Qualitative 

Ql-f = frequencies in qualitatively differ- 
ent classes. 

Ql-m = magnitude of some function for quali- 
tatively different classes. 

Quantitative 

Qn-c = frequencies in quantitatively differ- 
ent classes of a continuum. 

Qn-d = frequencies in quantitatively differ- 
ent classes of a discontinuum. 

Qn-o = frequencies in the ordered classes 
of a continuum. 

PROBLEMS 

As a vocabulary review the student may test 
his knowledge of the following words and phrases: 

basal year character 

categorical series characteristic 

center of population collection of data 
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continuous 

continuum 

counter sample 

contrast sample 

cycle 

discrete 

fair sampling- 

frequencies in a class 

geographical series 

growth series 

index 

natural limits 
periodicity 
price index 
price ratio 
price relative 
probability 


pseudo-time series 

qualitative series 

quantitative series 

ratio 

sampling 

series 

significance 

spatial series 

statistical series 

tabulation of data 

temporal ■» 

- r / series 
time 

terminal statistics 

trait 

trend 

zero point 


Think of series of the sorts T-t, T-c, etc. , 
of preceding Section 4. In one or two sentences 
describe each. The effort should be made to think 
of simple situations. When giving an illustra- 
tion of one sort attempt to choose data which are 
so simple that no other series type is important 
in connection with it. 


CHAPTER III 

STATISTICAL TABLES 

section l. CHARACTERISTICS OF A GOOD TABLE 
IN GENERAL 

The chapter which follows deals with graphic 
methods and is concerned with charts, diagrams, 
graphs, etc., constituting pictorial represent- 
ations of statistical series. The statistical 
table is quite different. Its purpose is not 
directly to give a picture of a sequence, but to 
provide the basic data from which such a picture , 
or at least the outstanding features of such a 
picture, may be determined and visualized if 
desired. The statistical table is simply a short- 
hand statement of facts. If a thousand or so 
facts of the sort, "The population of Aaber County 
is 4000;" "The population of Anthony County is 
3200;" "The population of Avery County is 4800;" 
etc. , etc. , are to be presented, they can not 
only be more concisely shown by tabulation, but 
several thousand additional facts, such as "The 
population of Anthony County is 800 larger than 
that of Aaber County" are presented at the same 
time and in an agreeably compact manner. The 
desire to accomplish double, triple, or manifold 
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presentation by a single tabular arrangement is 
the desideratum which imposes conditions and 
determines appropriateness of procedure. 

The same facts in regard to population are 
shown in the following five tables, and while 
not exhausting the possibilities of presentation, 
these will suffice to show the wide option which 
exists in presenting very simple data. 


TABLE III A TABLE III B 


Populations a/id Areas 


of Counties 


Counties 

Popul a- 
t ion 
1920 

Area 
in $q. 
Miles 

Aaber 

4,000 

480 

Anthony 

3,200 

400 

Avery 

4,800 

800 

Bascomb 

16,000 

700 

Brown 

3,000 

600 


Populations and Areas 
of Count i es 


Counties 

Area 
in $q. 
Miles 

Popula- 

tion 

1920 

Aaber 

480 

4, 000 

Anthony 


3, 200 

Avery 

800 

4, 800 

Bascomb 

700 

16,000 

Brown 

600 

3, 000 


TABLE III C 

Counties arranged 
according to 
Population 


TABLE III D 


Count ies arranged 
according to 
Pppulat ion 


TABLE III E 

Counties arranged 
according to 
Population 


Counties 


Popul a- 
t ion 
1920 

Counties 

Popula- 

tion 

S920 

Popula- 

tion 

1920 

Counties 

3,000 

Bascomb 

16,000 

16,000 

Bascomb 

3, 200 

Avery 

4, 800 

4,800 

Avery 

4, 000 

Aaber 

4, 000 

4,000 

Aaber 

4, 800 

Anthony 

3,200 

3,200 

Anthony 


Brown 

3,000 

3,000 

Brown 
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As judged by a single purpose, no two of the 
tables given are equally meritorious. If the 
table is to be used more frequently in abstracting 
information about various counties than as a 
means of comparing counties, i . e. , if it is a 
reference table and not one pointing some con- 
clusion, the i terns in the stub ( the first column) 
should be arranged alphabetically, as in Tables 
III A and III B in order to facilitate the finding 
of items desired. I f populations are mo re likely 
to be studied than areas, Table III A is prefer- 
able to Table III B, as the Population column 
holds a dominant position in Table III A. 

Should it be intended that the table be not 
primarily a reference table arranged to simplify 
the extraction of items of info rmation, but, let 
us say, to point conclusions with reference to 
populations, Tables III C, III D, or III E are 
preferable to Tables III A or III B. If counties 
of large population are the chief consideration, 
Table III D is preferable to Table III C, as the 
first row of a table ranks higher in dominance 
than successive rows. Next in importance is the 
last row. Totals or averages are , because of 
their importance, frequently placed in the first 
row, but if other items demand this posi tion or 
if captions (headings of columns) are less readi ly 
interpreted when separated from the body of the 
table by a row of totals or averages, then the 
bottom row may be used. 

As a means of pointing conclusions dependent 
upon populations, Table III E is to be preferred 
to Tables III C or III D, as the population data 
hold the dominant position in Table III E. 

In general one should so draw up the table 
that the items in the stub and the captions 
constitute the argument or information with which 
the table is entered, and so that the column and 
row next to the stub and captions contain the 
most important items to be obtained from the 
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table. Rows and columns more removed from these 
dominant positions should contain less important 
data, except that the last row and last column 
may be given to data of first or second importance. 

Such Tables as III A and III B are primary or 
general-purpose tables , since they contain the 
raw data without abridgment , and may be used for 
various purposes . Such Tables as III C, III D, 
and III E are derived from primary tables , such as 
III A and III B, and by emphasizing certain facts 
serve a special purpose . These two types of 
tables should be recognized. The special-purpose 
table is always published because it conveys the 
point of the study. The' general-purpose table 
should always be published also, as it provides 
the only means of checking the author and of 
discovering if other or further conclusions can 
be drawn. Several tables and many calculations 
may be involved between the primary and the final 
derived table. If full description of these 
intermediate steps be given it is not essential 
that these intervening tables and calculations 
be published. 

We have indicated that there are three types 
of tables, (a) the genera 1 -puroose table, or 
table presenting original data without intent to 
point a conclusion or emphasize some feature at 
the expense of others; (h) the intermediate 
table , or table which is derived from the general- 
purpose table in the process of putting the data 
in such form as to lead to a conclusion; and 
(c) the special-purpose table , or table derived 
from (a) and (h) so as to bring into relief some 
conclusion. Now let us briefly note what consti- 
tute desirable characteristics of all three, and 
then of each separately. 

Principles of dominance and labeling are the 
same for all three sorts of tables. The approved 
lettering is such as can be read if the observing 
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eye is in front of the bottom of the page. If 
demands of typography require vertical lettering, 
it should always be so as to be read from the 
right-hand margin, that is, the eye is considered 
to be at the right, and not at the left. This 
principle of location of the observing eye at 
the bottom or right applies equally to lettering, 
whether horizontal, vertical, or oblique, of 
graphs. The title should be at the top of the 
page. When placed at the bottom, as is occasion- 
ally done, it is in a position of second domi- 
nance. The title should he brief and answer the 
following quest ion 1 which we may always assume 
is in the reader's mind, "Is the subject matter 
of this table such that I am interested in look- 
ing at the table?” If brevity of title is incom - 
patible with accuracy, there may frequently be 
employed a subtitle in smaller type, or a foot- 
note, thus preserving the brevity and large type- 
size of the main title. It is the thing that the 
eye is supposed to rest on first as the page is 
viewed. This same principle applies to titles 
of graphs. 

There are always two means of entry into the 
table. One is given by the items in the stub 
(the left-margin column of the table) and the 
other by the items in the captions of the columns. 
Generally the means of entry involving the larger 
number of items is made the stub, so that the 
smaller number of items may be indicated in the 
captions. The column next to the stub is the 
dominant column, while the next in dominance is 
the last column or one at the extreme right of 
the table. The dominance of any column may be 
increased by enclosing it in heavy rules. The 
row next to the captions is the dominant row, 
while the next in dominance is the last row. The 
dominance of any row may be increased by heavy 
ruling or leading. 
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SECTION 2. SPECIAL CHARACTERISTICS OF THE THREE 
TYPES OF TABLES 

General -purpose table: There are certain 
special requirements of the general-purpose table. 
It is to present the original data in all its 
detail except for items considered t o be imma- 
terial. For example, if John Doe, age 13.5, 
seated at desk 17 of Mary Brown's fifth-grade 
class, secures a test score of 86, it may well 
be that the general-purpose table merely records 
the sex, the age, the grade, and the score, it 
being assumed that the name of the boy, that of 
his teacher, and the desk where seated are ir- 
relevant with reference to any of the sundry 
purposes with which readers may approach the 
table. The author may be incorrect, for some of 
his readers might be interested in, say, the 
teacher as related to pupils' scores. However, 
this may be, it is clear that it is incumbent 
upon the author to anticipate such potential 
uses of his data, and in the light of the variety 
of usage that he can anticipate, construct his 
general-purpose table. He should seldom limit 
the presentation so as to cover the single type 
of use to which he himself has put the data. If 
he does so limit it, then the table is still of 
value in permitting a reader to verify the author' s 
computations, but this is probably a much narrower 
value than could have been given to it. 

We may not say that in publishing a general- 
purpose table the author should have no purpose, 
rather, he should have a variety of purposes, 
as many as the uses that he can anticipate or 
imagine. He should subordinate his own special 
purpose and serve these many purposes in the 
arrangement of the table. 

Of first importance is consideration of the 
means of entry which his reader will be called 
upon to use. 
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The devisor of the general-purpose table must 
think of the concepts and classifications which 
he can count upon finding in the minds of his 
readers. He can certainly count upon their 
knowledge of the alphabet. He can count upon 
their knowledge of chronological order, and, with 
somewhat less certainty, knowledge of certain of 
the major geographical relationships. He can 
believe that certain classes such as "live stock," 
"cereals," "physical measurements," "mental 
measurements," "maintenance costs," "instructional 
costs," etc., will be intelligible and will in- 
clude about the same items for all of his readers. 
In general, the arrangement and classifications 
of stub and captions are to be such as call upon 
knowledge of the sort mentioned. Alphabetic 
entry is, in particular, of wide utility. Such 
emphasis as is consequent to the arrangement of 
the table should be for the sake of ease of entry 
to the table, and not to emphasize some item 
therein. 

A logical arrangement involving the concepts 
of subordination and coordination is usually a 
psychological aid to entry. For example, captions 
such as these 


FARM PRODUCTS 


LIVE STOCK 

FRUIT 

CATTLE 

HOGS 

SHEEP 

HORSES 

OTHER 

APPLES 

PEACHES 

PEARS 

OTHER 


provide a number of co-ordinates under a single 
superordinate. The ruling and type size bring 
into relief the classi fication used and required 
of the reader in order properly to enter the 
table. Obviously the principle to follow is to 
make rows or columns contiguous that will gener- 
ally he used together. Another principle to be 
kept in mind, though it frequently must be vio- 
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lated, is that normal reading habits and muscular 
development of the eye favor horizontal eye move- 
ment rather than vertical. 

In the general-purpose table there should be 
emphasis on the natural arguments used by and the 
presumed interests of the reader , while in the 
special -purpose table there may well be emphasis 
on the argument used by the author and upon his 
special interest . In the general-purpose table 
rulings, leadings, and placing in dominant po- 
sitions should be in the light of the reader's 
already established modes of thought, thus nothing 
new is being taught him, while in the special- 
purpose table these same things are used with 
the view to teaching him something, or giving an 
emphasis that is new to him. 

Special-purpose table: The essential charac- 
teristics of the special-purpose table have 
already been indicated. It is designed to point 
a conclusion. It generally consists of derived 
statistics and presents a difference between two 
or more statistics. It may well be that there 
are many intermediate processes of assembly and 
computation between the items of the general- 
purpose table and the resulting special-purpose 
table. A sample of these intermediate processes, 
or a very thorough description of them may suf- 
fice. In any case, sufficient should be given 
so that the doubting reader will be able to start 
with the data of the general-purpose table and 
verify all the statistics given in the special- 
purpose table. It is obvious that the special- 
purpose table is always published, for it consti- 
tutes the conclusion made by the investigator. 
There is equal requirement in an article claiming 
scientific value that the general-purpose table 
be published. It may sometimes suffice in semi- 
scientific publications intended for popular 
consumption to forego publication of the original 
data, if reference is made to some source, perhaps 


92 


STATISTICAL TABLES 



a more technical journal, where it is available. 
This procedure is a makeshift and not in general 
to be recommended. 

In the desire to place emphasis, great care 
must be exercised to prevent distortion. For 
example, a table admittedly emphasizing the size 
and importance of Myberg might be as follows: 

Ci ties 

Myberg 
Ladome 
Momcup 
Defton 
Frilty 

The emphasis upon Myberg has been accomplished 
by recording in the dominant row. This, in it- 
self, is not to be critized, but emphasis by 
recording the population in more expansive type 
horizontally than that used for the cities with 
which Myberg is being compared is misleading, or 
if the reader discounts the type size so accu- 
rately as not to be misled, it nevertheless has 
been annoying to him. A better presentation is 
as follows: 

Ci ties 
Ladome 
Momcup 
Defton 
MYBERG 
Frilty 

Here again the author has emphasized Myberg, 
but he has now done so without sacri ficing accu- 
racy. The significance of the population of 
Myberg is only sensed when the contrast sta- 
tistics are known. Thus it is equally important 
that the reader know the population of the other 


Population 

14,134 

11,655 

6,758 

2,174 

2,084 


Population 

2 174 
14134 
11655 
6758 
2084 


<r 
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cities. These should accordingly be given in 
such type and in such arrangement as best to en- 
able comparison with the population of Myberg. 
Whereas the item of entry to a general-purpose 
table has been left to the particular interest of 
the reader, an endeavor has been made in the 
special-purpose table to dictate this, in so far 
as the author is able to do so, by utilizing the 
available means of emphasis. Though the author 
may legitimately dictate the issue to be attended 
to in the special-purpose table, he is never 
justified in misrepresenting any quantitative 
relationships inherent in the data, or even in 
suppressing by omission or relegating to obscure 
positions "data which are of mo.st comparative im- 
portance. 

Intermediate tables: Such intermediate tables 
and computations as have been employed in reach- 
ing the conclusions shown in the special-purpose 
table, or in other terminal statistics, represent 
the development of the plot, as it were. The 
author of a short story may revise and refine 
some elementary plot a score of times before it 
finds expression in his published work. The more 
hidden, though sound, the steps in its develop- 
ment, and the cleverer it, is, the more appealing 
to the reader. Unhappily, in the scientific 
study no concealment of the steps of development 
is permissible. These should be fully described 
and, if quite involved, illustrated by examples 
which may include both tables and computations. 
The scientist is called* upon to belittle his own 
genius by a full demonstration of the directness 
and simplicity of his processes. If he does 
otherwise he may be fawned upon by the Ladies' 
Aid or the Up-and-Coming Dinner Club, but he will 
hardly aid the efforts of serious students. The 
student should remember that verifiable knowledge 
is the object of his endeavor, and the verifi- 
cation should be by others. Common experience 


94 


STATISTICAL TABLES 


indicates that occasionally there are capable 
students whose enthusiasm or whose distaste for 
laborious presentation leads them to jump from 
finding to finding without stopping to prove 
their course, with the net outcome of detachment 
from, and lack of influence upon, fellow students 
in the same field. 

It happens not infrequently that some statistic 
readily derived from the data given in a general- 
purpose table will be used so generally as to 
warrant its inclusion in such a table. For ex- 
ample, if the population of the states of the 
United States constitutes the data of an original 
table, there surely should be recorded in this 
table the population by divisions, and finally 
the population for the United States entire. 
These are derived statistics and do not constitute 
basic items in a general-purpose table. It may 
be believed, however, that they will be used 
almost universally by the readers of such a table. 
It is thus a courtesy to the reader to make the 
necessary additions for him, and to record these 
totals. In so far as this is done, the table has 
taken on a special-purpose function, — one amply 
justified by the generality with which these 
totals will be desired by readers in pursuing 
their separate interests. When such derived 
statistics are of such importance as to warrant 
inclusion in what is otherwise a general-purpose 
table, they generally warrant being placed in a 
dominant position, the row next to the captions, 
or the column next to the stub, or the last row 
or last column. They may even warrant inclusion 
in a bolder type-face or with different leading 
than the more detailed or primitive data which 
are basic to the table • Table IV S illustrates 
this, and it also gives the standard of U. S. 
Census practice in dividing the United States 
into divisions and in the order in which the 
divisions and states are listed. The most common 
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derivatives which may add to the usefulness of a 
general-purpose table are totals , arithmetic 
means (or other averages), standard deviations 
(or other measures of variability) , proportions, 
and percentages. The main principles involved in 
the construction of statistical tables here dis- 
cussed, as well as certain other points, were 
found in Bowley (1926), Crum and Patton (1925), 
Day (1920), and Young (1925). 

PROBLEMS 


Problem 1. The student may test his knowledge 


of the subject matter of 
himself a vocabulary test 
words and phrases. 

general-purpose table 
special-purpose table 
intermedi ate table 
original table 
derived table 
dominance 


this chapter by giving 
involving the following 

order of dominance of 
portions of tables 
stub 
caption 
observing eye 
terminal statistics 


Problem 2. Draw up a general-purpose table, 
presenting data as here described: In connection 
with a certain study the following items of in- 
formation for 300 American white delinquent boys 
in a Texas juvenile training school, ages 8 to 
20, were gathered. The order in which they are 
here mentioned is haphazard. 

name 

height in mm. 
length of head in mm. 
strength of grip of right hand in lbs. 
age in completed years and days of 
a partial year 

score on a constructive ability test 
speed of tapping (30 seconds with 
the preferred hand) 
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strength of vision of left eye - Snellen 
chart test 

circumference of head in mm. 
lung capacity in cu. inches 
school grade 

handedness (left or right) 
strength of vision of right eye 
strength of grip of left hand 
number of scars on cranium 
mental test score 
breadth of head in mm. 
pubertal development ( pre -pubescent , 
pubescent, or post-pubescent) 
weight in pounds 

Assume that the users of the table do not know 
the names of the boys, so that an alphabetical 
arrangement by name of boy will not be service- 
able. The steps to follow are: 

1. Choice of title with or without subtitle. 

2. Selection of the two main lines of classi- 
fication, and allocation of one to the stub 
and the other to the captions. 

3. The classification of items within the stub 
in harmony with the knowledge of the reader. 

4. A similar classification of items in the 
captions. 

5. An arrangement of items of stub in con- 
formity with the principle of placing the 
most generally used items in dominant po- 
sitions, and in keeping related items to- 
gether. 

6. A similar arrangement of the items of the 
captions. 

7. The use of special rulings, leadings, type 
size, bold face, italics, and capitali- 
zations to facilitate entry as it will pre- 
sumably be made by the reader. 

8. The inclusion or non- inclusion of certain 
derived measures in the light of the gener- 
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ality with which it is expected they will 
be used. 

9. The actual lettering of title, stub, and 
captions in order to supplement ease of 
entering, as planned for in the preceding 
eight steps, and in harmony with the princi- 
ple of the location of the observing eye. 

Problem 3. Draw up a special-purpose table, 
using the accompanying data taken from Brigham, 
CarlC .,A Study of American Intelligence (1923), 
p. 120. Your purpose is to show comparative 
intelligence test scores for certain Nordic and 
Mediterranean strains, having defined country of 
origin as representing strains as indicated below. 
Mean army intelligence test scores of certain 
drafted men according to country of origin were 
found to be as follows: 

Nordic: Belgium 12.8, Canada 13.7, Denmark 
13.7, England 14.9, Holland 14.3, Norway 
13.0, Scotland 14.3, Sweden 13.3. 

Mixed: Austria 12.3, Germany 13.9, Ireland 

12.3, Poland 10.7, Russia 11.3 

Mediterranean: Greece 11.9, Italy 11.0, 

Turkey 12.0. 


What statistic or information in addition to 
that given in the preceding problem do you con- 
sider would be of prime value in serving the 
stated special purpose? (The answer to this 
question may appropriately be postponed until 
the next six chapters have been studied.) 
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GRAPHIC METHODS 

SECTION 1 . GENERAL FIELDS SN WHICH SERVICEABLE 

An essential difference between a written 
description of a landscape and a painting of it 
is that the reader exercises minimum initiative 
in determining what is noted, while the viewer 
of the painting, no matter how subtly and suc- 
cessfully the high lights have been drawn, does 
take an active and personal part in determining 
the focus of attention* The wider the experience 
and training of the observer the more likely 
that the feature of most worth in the painting 
will be apprehended. For the fullest enjoyment 
not only must the artist have been an expert in 
design and execution, but the observer must be 
expert also. 

For fullest utility, the graphic portrayal of 
statistical data is dependent upon skill in 
presentation and commensurate skill in interpre- 
tation. When both of these are present the 
enjoyment and fullness of meaning gotten in 
reading a graph exceeds that in reading a de- 
scription of the same data. We can, very reason- 
ably, consider the meaning gotten from reading a 
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graph exceeds that in reading a description of 
the same data. We can, very reasonably, consider 
the meaning gotten from reading a graph as a 
product of d, m, and r, wherein d represents the 
data with its complete meaning, m 9 a quantity 
between 0 and 1, the excellence of the art of 
the maker of the graph, and r , also between 
0 and 1, the expertness of the reader of it, thus 
dmr<d if either m or r is short. 

If the reader of the graph is quite untutored 
the maker must resort to stars, arrows, colors, 
heavy lines, light lines, shading, verbal addenda, 
pic tograms , etc., to make the story clear and 
make the reader’ s task as light as possible. If 
the reader is inexpert he is correspondingly 
gullible and the maker is under a heavy obli- 
gation not to resort to the little tricks of 
bias, under and over-emphasis, so available and 
so misused by the promoter. 

Various graphic devices are listed in Table 
IV A, arranged vertically according to felt ease 
of understanding, that is simplicity as the lay 
reader conceives it, and horizontally according 
to field served. 

Distorted pictures, maps, and cartoons with a 
quantitative element may be designed to relieve 
the reader of necessity for search for and correct 
appraisal of the significant feature, but they 
may also be designed to suppress or exaggerate 
phenomena and to prevent correct appraisal. The 
demarcations between the inaccurate and the 
accurate, and between the popular and the techni- 
cal, in graphic portrayal are hair lines which, 
though willfully crossed by promoters, are also 
often crossed inadvertently by amateur statis- 
ticians. 

When the nature of the data permits, the 
picturing of facts conveys a readier comprehension 
than does a tabular array of figures. Since there 
are but two dimensions to the surface of a sheet 
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of paper, ordinarily but two variables are de- 
picted in a single graph. 

section 2 . TEMPORAL SERIES 

Time chart: Consider the accompanying data 
giving the maximum temperatures recorded by the 
Weather Bureau for each day in July and August, 
1917, for New York City. 

TABLE IV B 

MAXIMUM TEMPERATURE FOR EACH DAY IN 
DEGREES FAHRENHEIT 

July l-Aug. 3i, S9I7 
New York City 


July f 

o 

O 

CO 

July 17 

87 

Aug. 1 

98 

Aug. 17 

85 

2 

88 

18 

80 

2 

96 

18 

80 

3 

74 

19 

77 

3 

83 

19 

81 

4 

78 

20 

83 

4 

80 

20 

84 

5 

81 

21 

81 

5 

82 

21 

85 

6 

80 

22 

86 

6 

82 

22 

80 

7 

79 

23 

86 

7 

88 

23 

76 

8 

70 

24 

86 

8 

78 

24 

83 

9 

75 

25 

84 

9 

83 

25 

82 

10 

65 

26 

85 

10 

80 

26 

74 

11 

66 

17 

90 

II 

82 

27 

82 

12 

71 

28 

80 

12 

83 

28 

80 

13 

81 

29 

81 

13 

83 

29 

83 

14 

8! 

30 

95 

14 

78 

30 

81 

15 

75 

31 

98 

15 

81 

31 

75 

16 

85 



16 

80 




If it is desired to study diurnal changes in 
maximum daily temperatures, a graph is made in 
which the abscissa represents the days in order, 
July 1, July 2, etc. , and the ordinate represents 
the temperatures, 0°, 1°, 2°, etc. For July 1 
the ordinate is 80, for July 2, 88, etc. A line 
connecting the successive ordinates provides a 
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Chart IV I is such a graph and is called a 
Time Chart, Its two dimensions are time and 
magnitude of something as it changes with a change 
in time. 

Certain detailed features of charts , zero 
points, scales used, rulings and proportions: 
As the initial and final dates do not tepresent 
the beginning and end of time, but are determined 
by the author* s whim, there should be no heavy 
vertical rules bounding the curve , A little 
space should be left between the initial plotted 
point and the left vertical rule of the chart 
and between the final point and the right rule 
of the chart, as this gives the entirely correct 
impression that the curve is dangling in time and 
that both earlier and later observations were 
possible and might have been plotted. The double 
wavy line near the bottom of the chart is drawn 
to convey the information that the temperature 
scale has been broken and that the actual zero 
point on the temperature scale is much lower down 
than indicated by the heavy horizontal rule at 
the bottom of the chart . 

The omission of a large portion of a vertical 
scale is sometimes dictated by limitations of 
space available . For accurate portrayal it is 
desirable that a natural zero point be the zero 
poin t of the raw scores or measures and that 
graphic distances be correct representations of 
dist ances above this point . Fo 1 lowing this 
principle in the case of these temperature data, 
we measure temperatures from absolute zero, **460° 
Fahrenheit. The temperatures then appear: July 1, 
540° above the zero point, July 2 , 548° above, 
etc. The vertical dimension would then be some 
nine times as great as shown, unless this scale 
is reduced, and if reduced to 1/9 the Chart IV I 
scale we get Chart IV II, which is not very help- 
ful to a prospective summer visitor to New York 
or to a salesman of air-conditioning equipment. 
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Natural zero points for one purpose may not 
be natural for another. For a physicist, -460° 
may be a natural zero point on the temperature 
scale. For a banana planter, perhaps 32° is the 
important and, for his purposes, the natural 
point of reference. For a physician or hospital 
nurse, perhaps 98.6 is such a point. The writer 
will not attempt to define "natural zero point" 
for the definition must depend upon the particu- 
lar phenomena and purpose in hand, but he would 
point out that there are such points and that 
they are the points from which scores or measures 
should deviate for the most meaningful presen- 
tation. In the field of prices an obvious zero 
point is SO. 00. In situations where the top 
value is necessarily limited, as for example is 
100 per cent of a quantity, this value becomes a 
natural point of reference . There may be two 
natural points on the scale. In dealing with 
quantities that are percentages of a total, both 
0 per cent and 100 per cent are such points. 
It is to be noted that in the matter of maximum 
temperatures 100° is not a limit and Charts IV I 
and IV II as drawn are not bounded at the top by 
a heavy horizontal rule. Where there is such an 
upper limit a heavy rule representing it is de- 
sirable. The ogive curve shown in Chart IV XIII 
is representative of this situation, — it being 
bounded at 0 and 100 per cent by heavy rules 
at left and right. If, for the artistic purposes 
of make-up it seems desirable to bound the curve 
with a rule at the top, it should be well above 
the maximum ordinate of the curve so as to give 
as little impression as possible of being a part 
of the data. 

Ordinarily the vertical rulings of a time 
chart should be without emphasis, as one date is 
neither more nor less important than another. An 
exception can be made in the case of a relative 
time chart, as shown in Chart IV IV, where prices 
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or magnitudes are expressed relative to that at 
some basal date, the price or magnitude at this 
date customarily being called 100. In this 
instance the vertical rule at this date may he 
drawn heavier than for other dates, as shown in 
Chart IV IV. A second instance arises in con- 
nection with growth curves, for here the be- 
ginning, such as the date of birth , is a natural 
starting point and may well he indicated on the 
chart hy a heavy vertical rule . 

For all the common charts, except circle dia- 
grams, the scales used are at the discretion of 
the maker. It has been claimed that the greatest 
artistic balance is reached if the ratio of over- 
all height to over-all width is the golden mean 
provided by the extreme and mean ratio. This 
ratio, x/ ( 1-x) = (l-x)/l, holds when (l-x)/l = 
.618/1.00, or approximately 3/5. General experi- 
ence supports the view that a pleasing result is 
obtained if the height of a curve is 3/5 its 
width . In Chart IV II, if attention is fixed 
upon the curve only and not upon the bounding 
rules, the ratio of height to width is about one 
to thirty, which is altogether undesirable. 

The lettering of a chart follows the same 
principles as those applying to a table. A chart 
may require an explanatory "legend" as shown in 
Chart IV IV. The upper left corner is an excel- 
lent place for such a legend, but it may be 
placed in any free region. Vertical and hori- 
zontal scale rulings should be reduced to a 
minimum and given in very light lines , except 
that some slight emphasis may be given to round- 
number rulings usually represented by every fifth 
or every tenth line. 

Limitations of the time chart: Let us plot 
the several data of Table IV C upon a single time 
chart. The result is shown in Chart IV III, 
where three different ordinate scales have been 
employed, one for wages, one for steak prices, 
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TABLE IV C 
PRICE AND WAGE DATA 
CHICAGO* 


LL S. Entire 
Dunn’ s 

Wholesale 
Price Index 


Av. Yearly 
Retai 1 
Price 

Round 

Steak 


Union Wage per Hour 

_ . , Linotype 
Painters ^ , 

Operators 


1907 

$ 107.264 

1908 

113.282 

1909 

111.848 

1910 

123.434 

1911 

1 15 . 102 

1912 

123.438 

1913 

120.832 

1914 

124.528 

1915 

124.168 

1916 

137.656 



U* S. Dept. of Labor, Bur, of Labor Statistics. Union Scale 
of Wages and Hours of Labor, 1916. 

CHART IV III 


INCREASES IN WHOLESALE PRICES, RETAIL PRICES AND WAGES 

CHICAGO — I907H9I6 
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and one for wholesale prices* The advantages of 
this presentation over one showing three charts 
lies in a saving of space and in nothing else. 
In fact, there is a decided disadvantage in having 
three such separate scales upon a single chart. 
Accurate comparisons between the wholesale, 
retail steak, and wage curves are not possible 
by this chart, though the existence of the three 
curves in a single chart encourages the reader 
to attempt to make them. The test of any graphic 
presentation is that the judgment of magnitude 
consequent to the visual impressions received 
is correct. By this test Chart IV III is nearly 
a complete failure, — the comparison of wage of 
linotype operators with that of painters being 
the only inter-curve comparison that is sound. 

Not infrequently interest in, or, we may say, 
importance for our purposes of temporal data 
decreases as more and more remote time is con- 
sidered. This decrease may be of much the same 
order as the decrease in size when things are 
viewed in perspective. When temporal data are 
shown we have a retrospective chart. It has 
points of similarity both with the semi-loga- 
rithmic chart and the block diagram, as discussed 
by Karsten and Brooks' in "Retro" Charts (1943). 

The relative time chart: The relative time 
chart is designed to enable comparisons of the 
relative changes in di fferent magnitudes accompa- 
nying a change in time. It does permit of such 
comparison, but only with reference to some defi- 
nite data which has been chosen as the basal date. 

The basal year may be any date within or with- 
out the range of time presented. There are r ir 
common bases for the choice of one certain date 
rather than a second, and all of these have the 
merit of choosing a moment in time when the phe- 
nomena are, in some respect, more stable than at 
neighboring moments. Temporal phenomena have 
the habit of oscillating, though usually very 
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irregularly, as time changes. Such a function, 
if continuous, has a moment of no change at each 
maximum and minimum and a moment of uniform change 
at each point of inflexion. These points have 
features of stability which are advantageous in 
a point of reference, —the point of inflexion of 
a smoothed temporal series would seem to give an 
especially excellent and stable point for a study 
of trends. In economic phenomena the height of 
inflation, the bottom of depression, and a date 
between the two when the r&te of change seems 
regular are common basal dates. The fourth reason 
for a certain choice is custom, which in turn is 
commonly related to one or another of the three 
conditions mentioned, though it may be so related 
in terms of much more varied data than that given 
by a single variable. 

We will show increases of the prices and wages 
recorded in Table IV C relative to their values 
in 1907, which date becomes the basal year. We 
first compute, from the data of Table IV C the 
ratios as given in Table IV D and then plot as 
shown in Chart IV IV. 

CHART IV IV 
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TABLE IV D 

PRICE AND WAGES EXPRESSED AS RATIOS* 1907 AS BASE 


C H I C AGO DATA 


Year 

Dunn* s 
Whol esal e 
Price 

1 ndex 

Av* Yearly 
Retail 

Price 

Round 

Steak 

Union Wage per Hour 

Painters 

Linotype 

Operators 

1907 

100 

100 

100 

100 

1908 

106 

104 

100 

100 

1909 

104 

III 

1 10 

100 

1910 

115 

113 

120 

too 

1911 

107 

111 

120 

100 

1912 

115 

134 

120 

100 

1913 

113 

141 

130 

100 

1914 

116 

156 

140 

100 

1915 

116 

148 

140 

100 

1916 

128 

158 

140 

100 


m The decimal point is omitted, as is usual, so that a ’’ratio" 
of 106 means I rv fact 106 /100,. 

The vertical rule for 1907 is made heavy be- 
cause this date is different from any of the 
other dates, it being the basal year. Also the 
horizontal rule for 100 is heavy because this 
also is unique, all distances above or below 
being correct measures of percentage change from 
the 1907 status. 

The relative time chart has certain inherent 
shortcomings. Changes relative to some other 
date than the basal date are frequently of inter- 
est, but are not shown, and second, a given 
distance below the 100 rule, representing as it 
does the same proportionate change as an equal 
distance above, becomes misleading. If the 1907 
wholesale price, $107.26, doubles, it increases 
100 per cent, becoming $214.52. If this price 
halves, it decreases by but 50 per cent. The 
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price must fall to $0 . 00 to decrease 100 per cent, 
A change to $214,52 and a change to $0.00 are 
represented by the same distance on the relative 
time chart, but certainly the economic upheaval 
in the one case is in no sense equal, but oppo- 
site in nature, to that represented by the other. 

Advantageous as the relative idea is, it is 
nevertheless grossly inadequate if great ine- 
quality in relative change of items is present. 
The relative time chart technique should be 
limited to situations wherein percentage changes 
are small, or at least of a somewhat uniform 
nature for the different variables involved. 
Where relative changes are large, a more informa- 
tive graphic presentation, though one calling 
for greater knowledge upon the part of the reader, 
is by means of a semi -logarithmic time chart or 
semi -logarithmic relative time chart. 

Semi -logarithmic charts: The characteristic 
of semi-logarithmic charts is that the scaling 
in one dimension, usually the ordinate, is pro- 
portional to the logarithms of a variable, while 
in the other dimension, which is frequently time, 
the variable itself is represented. If the 
scaling in both dimensions is logarithmic the 
chart becomes a logarithmic chart . Since, for 
all positive numbers, differences in the loga- 
rithms of numbers exactly parallel proportional 
differences in the numbers themselves, logarithms 
are peculiarly appropriate in studying situations 
wherein equal proportionate changes have equal 
psychological or economic significance. Semi - 
logarithmic cross-section paper is an aid in 
making these charts, as it is then not necessary 
to look up the logarithms of the magnitudes 
plotted. This aid is quite unnecessary as the 
looking up of logarithms prior to plotting is a 
very simple task. Even this small labor may be 
avoided at the cost of some accuracy if equal 
spacings upon ordinary cross-ruled paper are 
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labeled with numbers whose logarithms differ by 
approximately equal amounts as given in Table 
IV E for a thirteen- and a twenty- division scale* 
Another simple way to make one 7 s own logarithmic 
paper is to scale blank paper according to the 
divisions on the slide of a slide rule. 


TABLE IV E 

NUMBER VALUES CORRESPONDING TO EQUAL 
LOGARITHMIC DIFFERENCES 


Th i rteen 
scale to 

-division 

2+ places 


Twenty-d i v is ion 

scale 

to : 

2+ places 

to 3 places 

10 

34.5 

10 

35.5 

100 

355 

12 

41 

1 i 

40 

1 12 

398 

14- 

49 

12.5 

44.5 

126 

447 

S 7 

59 

14 

50 

S4I 

502 

20 

70 

16 

56 

159 

563 

2*4 

84 

is 

63 

178 

631 

29 

100 

20 

71 

200 

708 



22.5 

79.5 

224 

795 



25 

89 

252 

892 



28 

100 

282 

1000 



3 1.5 


316 



A benefit derivable from equally spaced rulings, 
labeled as in the thirteen- or twenty-division 
scales given is that two curves whose relative 
changes are important, but which would be widely 
separated if a single logarithmic scale is used, 
may be brought close together, using an equally- 
spaced grid and labeling differently for the two 
variables. For example, the wholesale price in 
dollars and the retail steak price in cents 
of Table IV C and Chart IV V can be shown by two 
curves in close proximity upon an equally-spaced 
grid by labeling a left-hand ordinate for whole- 
sale prices and a right-hand ordinate for retail 
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steak prices, as follows: 


CO 

LU 

if $159 

an 

Ixj $141 

<r 

co 

lu $126 

—i 

o 

* $112 


$100 


The wage and price data are plotted on semi- 
logarithmic cross-section paper in Chart IV V. 
Any year may be chosen as a base and the changes 
in heights of the curves noted for any other 
year in which interested. The distances measuring 
these changes are correct representations of the 
magnitudes of the proportionate changes between 
the two dates. This flexibility in choice of the 
basal year and the accuracy with which pro- 
portional changes are shown are merits not pos- 
sessed by the relative time chart. A disadvantage 
in Chart IV V as drawn is that a comparison of 
changes in different curves requires a comparison 
of linear distances that are considerably removed 
in space from each other. This difficulty can 
be avoided by plotting the curves upon tracing 
paper and then shifting each curve up or down as 
may be necessary until coincidence is attained 
for some desired date. 


2 5.2# 
22 . 4 # 
20 . 0 # 
17 . 8 # 

I 5.9# 

I 4. i# 

I 2.6# 
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CHART IV V 


RELATIVE INCREASES IN WHOLESALE PRICES RETAIL PRICES and WAGES 

CHICAGO— 1907-1916 



1907 1910 1913 1916 


LOGARITHMIC SCALE 
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As a further illustration of the use of loga- 
rithms let us consider stocks having closing 
prices as follows: 

TABLE IV F 


CLOSING STOCK QUOTATIONS 


Previous day ? $ 
closing price 

Stock A 12 1/2 

Stock 8 162 i/2 


This day ? s Change 
closing price 

13 + i/2 

169 + 6 i/2 


Neither an examination of these figures not in- 
volving a computation with them, nor an exami- 
nation of a time chart graphically showing the 
prices will be very helpful for such issues as 
ordinarily would interest one. The fact that 
Stock B increased $6 more than Stock A is proba- 
bly not of importance. The proportionate change, 
not the absolute change, in price is the item of 
moment to economist or investor. The computation 
of the ratio of the price at one date to that at 
a second, and a comparison of such "price ratios" 
is a common practice and does serve an immediate 
need, but in general is quite inadequate in that 
the second date, or basal date, must be changed 
with each change in issue or interest. The issues 
become quite simple and interpretations very 
direct if one but thinks in terms of logarithms 
of prices instead of the prices themselves. For 
the quotations cited the data are: 


TABLE IV G 

LOGARITHMS OF CLOSING STOCK QUOTATIONS 


Logarithm of 
previous day’s 
closing price 

Logarithm of 
this day’ s 
clos ing price 

Change 

1.0969 

1. 1 139 

.0170 

2.2109 

2.2279 

.0170 


Stock A 
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We immediately see that the proportionate in- 
creases are equal. Without any change in compu- 
tation as basal dates in which interested change, 
any two prices may be compared and changes for 
different time intervals may be compared. The 
difference given, ,0170, is the logarithm of the 
proportionate change in price. The number whose 
logarithm is .0170 is 1.040, informing us that 
the second price is 1.040 times the first, or 
that there is a 4 per cent increase in price in 
the case of both stocks. The relationships so 
clearly brought out by logarithms is graphically 
portrayed when amounts are plotted on coordinate 
paper with a logarithmic scale. The statisti- 
cally-minded will hope that a daily paper will 
some day publish the logarithms of market quo- 
tations themselves. For many purposes, including 
writing checks and receiving wages, it is neces- 
sary to deal with the ordinary number system and 
such things as dollars and cents, but for the 
study of relative changes in things, such as 
prices, we are better served by the logarithms of 
magnitudes than by the magnitudes themselves. 

Time Ratios: The quotient of a magnitude, 
such as a wage, a price, a measure of production, 
or a level of attainment, at one time divided by 
the similar measure at a different time is called 
a time ratio. A combination of such time ratios 
yields a general index, such as are the various 
business, cost of living, and price indexes. The 
properties of time ratios and of aggregates of 
them are fully treated by those making and using 
index numbers. It must here suffice to note that 
simple averages of such ratios generally have 
systematic errors of one sort or another due, 
among other things, to the generally skewed nature 
of distributions of ratios, as illustrated in 
Section III, Table IV N and Chart IV XII. 

The Growth Curve: An important temporal series 
arising in biometric studies is one involving 
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growth from the beginning, or near the beginning, 
to maturity. 

The accompanying table gives smoothed scores 
in a reasoning test as given by Kelley (1917). 
Plotted they give a typical growth curve. 

TABLE IV H 




This particular curve is interesting in that 
it shows a flattening at ages 13 and 14 , which 
is not at all characteristic of growth curves of 
mental traits, but as the units of measurement, 
instead of intrinsic ability, could conceivably 
account for the phenomena the curve does not 
prove, but merely suggests, that there is a 
pubertal dis turbance . For the purpose of the 
present statistical treatment no attention need 
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be paid to the double inflection of the curve. 

Rotating the curve through 90° and looking at 
it in a mirror (as pictured in Chart IV VII) 
shows its general resemblance to an ogive curve. 
It was possible in the case of daily temperatures 
to cumulate scores and obtain ogive curve data. 
By the reverse process it is possible from the 
ogive data to obtain the original growth curve. 
The growth curve may be plotted as herewith: 


CHART IV VII 

GROWTH CURVE REASONING ABILITY 



0 10 20 3 0 4 0 50 60 70 80 90 


TEST SCORE 

Thinking of the abscissas as sums of incre- 
ments of reasoning ability and recalling that 
the graph is for an average individual , whose 
maximum development or accumulation is to 94 of 
such increments (i. e. , the total population of 
increments is 94) the graph may be read: At age 
7 the individual possesses 1 1 increments of 
reasoning ability; at age 10, 50 increments, etc. 
This may be an awkward way of interpreting growth, 
but if it is desired to think of growth as a sum 
of increments it immediately suggests the de- 
termination of the increments added during each 
year of life as follows: 
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TABLE IV I 



-Score 

Yearly Growth 


AGE 

Increment 

Age 

0 

0 

0 

• 5 (from 0- l) 

1 

0 

. 5- 

1.5 (from 1-2) 



• 5+ 

2.5 (from 2-3) 

3 

1 

2 - 

3.5 (from 3-4) 



2 + 

4.5 (from 4-5; 

5 

5 

3 - 

5,5 (from 5-6) 



3 + 

6,5 (from 6-7; 

7 

ii 

8 

7.5 (from 7-8) 

8 

19 

12 

8.5, etc. 



19 

9.5 



12 

10.5 



5 

1 1.5 



4 

12.5 

13 

71 

2 

13.5 

14 

73 

3 

14.5 

15 

76 

6 

15.5 

16 

82 

5 

16.5 

17 

87 

3 

17.5 

18 

90 



Adult 

94 

4 

7 (from 18-adulthood) 



94 
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These growth increments plotted in the form 
of an ordinary frequency polygon give the graph 
shown in Chart IV VIII. 

CHART IV VIII 



AGE 


The bi-modality of the growth increment curve 
is of course a consequence of the double in- 
flection of the growth curve. Since the constants 
of this increment curve (mean, skewness, standard 
deviation, etc.) can be readily calculated, the 
curve has certain advantages over the growth 
curve. It should be a very convenient form in 
which to present data for purposes of studying 
variability in rate of growth, variability in 
price changes, etc. If the modifiability of a 
function (by education, by promotional sales 
pressure, by amount of food consumed, etc.) is 
roughly proportional to the normal rate of change 
of the function, the growth increment curve is a 
helpful graphic device. 

In dealing with functions in which there is a 
loss in a given period, e.g. , when an individual 
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weighs less in one year than in the preceding, 
negative frequencies arise. These heed cause no 
computational trouble if treated strictly alge- 
braically and the negative sign preserved. 

Brown and Thomson (1921) have shown that the 
standard deviations of the cla ss f requencies of 
such a curve are not given by the ordinary 
formula for the standard deviation of the fre- 
quency in a class. (See Chapter IX). 

section 3. QUANTITATIVE SERIES 

When the things observed are such that the 
measurements taken vary in a negligible manner 
with change in time when, or place where, taken, 
we get the usual quantitative or qualitative 
series. Observations of the height or eye color 
of children fulfill these conditions if the time 
interval is so small that growth is negligible. 
Whether measured on the first of the month, the 
second, or the fourth, makes no material differ- 
ence. Again, suppose we give a 30-second speed- 
of-tapping test. In this function, there is not 
only considerable variation from day to day but 
from minute to minute, so that the scores on the 
first, the second, and the fourth may be. quite 
different. Nevertheless we should, for most 
purposes, treat such a set of measures as a 
quantitative series, not because they do not 
change in time, but because they change in a 
chance manner only with a change in time. Ex- 
pressed in another way we can say that the inter- 
est we take in these measures is unrelated to 
changes in time. It not infrequently happens 
that the same data, depending upon our interest 
which determines the aspect of it that we study, 
is now of one type of series and now of another. 
When time and place are insignificant per se, or 
for our purposes are such, they are eliminated 
as axes or dimensions in the graphic portrayal 
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of the data. There remain, as dimensions for the 
chart, the value of the trait or character meas- 
ured and the frequency with which the distin- 
guishable values occur. 

The histogram and the frequency polygon: For 
illustration let us be concerned not with the 
specific dates connected with the maximum daily 
temperatures given in Table IV B, but only with 
the frequencies of occurrence of the different 
temperatures. We now have a quantitative series. 
The data are made into, a frequency table, as 
given in the first column of Table IV J, prior to 
plotting as a histogram, Chart IV IX, or fre- 
quency polygon, Chart IV X. 

CHART IV IX 

DAILY MAXIMUM TEMPERATURES 


NEW YORK CITY— JULY-AUGUST 1917 
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CHART IV X 


daily maximum temperatures 

NEW YORK CITY -JULY-AUGUST 1917 



Either the histogram or the frequency polygon 
are satisfactory devices for presenting quantita- 
tive series. The area under the histogram accu- 
rately represents the number of observations, 62 
in this case. A little simple geometry shows 
that this is also the case with the frequency 
polygon. A somewhat greater idea of continuity 
is given by the frequency polygon than by the 
histogram. When drawing the histogram, it is 
convenient so to record the abscissa scale that 
the rules of the paper used correspond to bounda- 
ries of class intervals. For example, 64.5 and 
65.5 are the lower and upper boundaries of the 
interval wi thin which lie temperatures recorded 
as 65°, which value is the mid point of the 
interval and is called its class index . The 
class limi ts are 64,5 and 65.5, and the class 
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intervals designated by the letter i , is i°« 

The reader will note that “interval" or "class 
interval" is used in a double sense* Its most 
usual meaning is the distance covered by a class 
(usually the same for all classes), that is, it 
equals the upper class limit minus the lower 
class limit. For the data in the 5 a column of 
Table IV J, the interval as thus defined is 5° 
throughout. "Class interval" may also define 
some specific class by stating the upper and 
lower limits* Thus, the second class interval in 
the 5° column of Table IV J includes values be- 
tween 67.5° and 72.5°* 

TABLE IV J 

FREQUENCY DISTRIBUTIONS OF DAILY MAXIMUM TEMPERATURES 
NEW YORK CITY, JULY - AUGUST, 1917 


Frequencies for Various Groupings 
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TABLE IV J (CONTINUED) 

FREQUENCY DISTRIBUTIONS OF DAILY MAXIMUM TEMPERATURES 
NEW YORK CITY, JULY - AUGUST, 1917 


Tpmpgr., Frequencies for Various Groupings 



The exact meaning of a recorded score: It 
has become somewhat customary in educational 
fields to speak of a child as solving 10 problems 
in a speed test, meaning thereby that 10 problems 
were solved and the 11th started but not fin- 
ished when time was called. In plotting the 
distribution of scores the designating number, 
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10, has been placed at the beginning of the inter- 
val. No objection should be made to this were 
the numerical computations in harmony with this 
procedure, but very generally such scores have 
been treated as exactly 10. 0 in calculating arith- 
metical averages, with the result that the curve 
and the constants computed from the data do not 
agree. Not uncommonly such scores have been 
treated as 10.0 scores in calculating means and 
as 10.5 scores in calculating medians, with the 
result that a comparison of mean and median scores 
gives an entirely erroneous impression as to the 
skewness of the data. This faulty procedure has 
probably been followed unwittingly, but unfortu- 
nately with the sanction of teachers. The follow- 
ing is quoted from page 50 of the Second Year 
Book, Division of Educational Research, Los 
Angeles, July, 1919: 


LESSON SI X— THE ARITHMETIC MEAN 
Method 


No. Problems 
12 
II 
10 
9 
8 


of Finding 

No. Pupil s 

3 
5 
7 

4 
2 


the Mean 


3 X I 2 = 36 
5 X 1 1 = 55 
7 X 10 = 70 

4 X 9-36 
2 X 8=16 

21 213 


213 divided by 21 equals 10.14, the mean. The 
median in this distribution would be 10.64." 

In this lesson problem the mean is in error 
if 12 stands for measures in the interval 12.0 
to 13.0 and the median (see Formula [4:01]) is in 
error if it implies the interval 11.5 to 12.5. 
The error here ci ted probably grew out o f an error 
in labeling a distribution. Uniformity is needed, 
and it would be in harmony with well-nigh uni - 
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versal procedure in the physical and biological 
fields to consider a score of 10 as being also a 
class index, or mid-point of an interval. Should 
this lower the grade of a few million school 
children by one-half a point no harm would be 
done, for all relative posi tions are maintained. 

Certain students have obj ected to this proce- 
dure when the score is zero. Situations exist 
wherein a negative score is impossible so that a 
zero class (interval according to the rule -. 5 to 
.5) would actually have all of its frequency in 
the region .0 to .5. Though the writer has never 
run across an experimental situation in which the 
matter was important, should it be so a di f ferent 
procedure, fully explained, as to class indexes 
and class limits, should be followed. The writer 
believes that the use of unequal intervals carries 
with it such annoyance in harmonizing computa- 
tional procedures leading to the mean with those 
leading to percentiles, and of both of these with 
graphic procedures as to warrant the general 
practive of treating every score (including 0) as 
a class index. 

Throughout this text a score , no matter how 
deri ved ori finally , is uniformly to be inter- 
preted as a class index or mid-point of an inter- 
val extending from hal f an interval below to half 
an interval above . 

A situation of great importance involving the 
meaning of a score a ri ses in connection with 
chronological age. A practice whi ch is not uni- 
versal , but so common as to demand recognition, 
is to speak of individuals of various age groups, 
8-year-olds, 9 -year-olds , upon the basis of the 
age at last birthday, and not of age at nearest 
birthday. Thus a child 8 years, 11 months, and 
29 days will be called an 8-year-old. I f this 
practice were universal , we could unequivocal ly 
call the class index of 8-year-olds 8.5, of 9- 
year-olds 9.5, etc., and we should then regard 
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a child's age, if it is merely stated that he is 
an 8-year-old, as 8.5. The lack of universality 
of a single practice in this important matter 
necessitates that each author specify in detail 
the exact meaning of the classes that he gives in 
any age classification. We shall here follow the 
practice of considering 8 8-year-olds" to con- 
sist of all those lying between the limits 8.0U 
and 9.00, which provides a class with mid-point 
8.50. 

When drawing a frequency polygon it is con- 
venient so to place the abscissa scale that class 
indexes coincide with rulings of the paper used. 
The number of days shown in Table IV J as having 
a maximum temperature of 64° was zero. Ac- 
cordingly the point (64, 0) is an essential point 
on the curve. The curve should be definitely 
terminated at this point and not at such a point 
as (64.5, 0). For both the histogram and the 
frequency polygon it is desirable that there be 
at least one interval between the bounding rules 
of the chart and the initial and final values 
plotted . Frequently it is desirable to shade the 
histogram so that the bounds of the curve will 
not be confused with the rulings of the paper 
used. Such shading is hardly necessary in the 
case of the frequency polygon. 

The curves as drawn show 11 modes , or temper- 
ature values for which the frequency is greater 
than for immediately smaller or larger values, as 
follows: 65.5, 70.5, 75, 78, 80, 83, 85, 88, 90, 
95.5, 98. If these 62 days are taken as a sample 
of July-August days in general, and such would 
typically be the case as one would be interested 
in these data as evidence of future phenomena, it 
is not to be expected that all of these modes are 
trustworthy, which would mean that one would 
expect the same modes to occur in 1918, 1919, etc. 
These sundry modes are features of the sample and 
are due to chance, or unknown causes, and are not 
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to be taken as descriptive of the population, 
the general distribution, or average distribution 
of maximum daily temperatures if for many years 
July-August records are combined. Probably the 
major mode, frequently called "the mode,"-- that 
in the neighborhood of 81°, is a feature that 
exists in the population and is found here, some- 
what distorted, in the sample. Let us now group 
the data of the first column of Table IV J into 
larger and larger classes, resulting in the 
several columns of Table IV J, and note what 
happens to the sundry modes. 

Grouping and labeling classes* The frequen- 
cies in the various columns of Table IV J are 
written opposite their respective class indexes. 
Note that when the interval is odd, 1 , 3, 5, 
etc., the class index is in each instance di- 
visible by the interval , and the class limits 
are fractional, being one-half the size of the 
interval below and above the class index. Thus, 
for the interval of 3, the first class has a 
class index of 66 (which is divisible by 3), a 
lower limit of 64.5 and an upper limit of 67.5. 
It is desirable that the class limits be fraction- 
al for then, if the original data are recorded to 
the nearest units, no value falls exactly upon a 
class limit and there is never ambiguity as to 
which class to assign a value. Classes may be 
unambiguously labeled in any one of the three 
ways, as illustrated in Table IVK, for an interval 
of 3. 

The first method is by recording class indexes 
and is frequently the simplest and neatest in 
appearance. The second method is by recording 
class limits and should be employed if the size 
of the class interval is not constant throughout 
the entire range of the data. The third method 
is by recording the inclusive integral values 
and is very satisfactory when making out tally 
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TABLE IV K 

THE LABELING OF CLASSES 

(i = 3) 



X 

X 

X 

First class 

66 

64.5 - 67.5 

65 - 67 

Second class 

69 

67.5 - 70.5 

68 - 70 

Third class 

72 

70.5 - 73.5 

71 - 73 

etc. 

etc. 

etc. 

etc. 


sheets from original data. If labeling is by this 
third method a value such as 70 is immediately 
seen to lie in the second class, while if labeling 
is by class indexes one must first note that the 
value 70 lies closer to 69 than to any other class 
index before it can be assigned to its proper 
class. 

If the size of the interval is even it is 
impossible to have an integral class index and a 
fractional class limit at one and the same time. 
Of these two the fractional class limit is the 
more important. Since when the interval is even 
the fractional class index cannot be exactly 
divisible by the interval we shall always so 
select the class limits as to make the lowest 
integral value in a class divisible by the inter- 
val . Thus, if i - 4, the first class is to be 
consistent with the following labeling, 65.5, or 
63. 5-67.5, or 64-67 (the 64 being divisible by 4) . 

This procedure is to be followed rigorously, 
including the common and important case when 
i = 10. Here the classes run as follows: 60-69 , 
70-79, etc. , and the class indexes are 64. 5, 74. 5 , 
etc. , —not 65, 75, etc. , — and the class 1 imits 
are 59.5, 69.5, etc., — not 60, 70, etc. If the 
original data are dollar and cent magnitudes, but 
not amounts in fractional parts of a cent, then 
the classes could be: 60.00-69.99, 70.00-79.99, 
etc., having class indexes 64.995, 74.995, etc., 
and class limits 59. 995 , 69. 995 , etc. It is to 
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be noted that even in this case 65, 75, etc*, are 
not the class indexes. 

In spite o£ all efforts to employ classes so 
that boundary values will not be found in the 
original data, it will sometimes happen that this 
cannot be done, so we must adopt a rule for the 
allocation of boundary values. We do not wish 
always to put the boundary value in the class 
above, or always in the class below, for ei ther 
of these procedures will introduce a systematic 
error. To avoid this, we adopt the rule of 
assigning a boundary value to the neighboring 
Value that is even. For example, the value 66.5 
will be called 66 , and not 67, and thus it is 
thrown in the lower class. The value 73.5 is to 
be called 74, and not 73, and is thus thrown into 
the higher class. This rule is commonly followed 
by astronomers and workers in the physical 
sciences. It is a serviceable one for general 
adoption, though it does no t meet all difficul- 
ties. If boundaries of classi fi cation can be so 
a rranged that the rule is not called into play, 
they should be so arranged. 


Effect of grouping upon modes: Following 
thi s di version upon grouping and labeling we 
will return to the question of the effect of 
grouping upon modes. An examina tion of the 
frequencies in the column for which i - 2 of Table 
IV J shows that there are fi ve mo des at temper- 
atures as follows: 65.5, 70.5, 74. 5, 80.5, 98.5, 
With a little coarser grouping, i = 3, there are 
three modes, 66, 81, 97. 5. With the grouping 
i = 4 there are two modes, 81.5 and 97.5. With 
the grouping i 3 5 there is but one mode, 80. 
With the next coarser grouping, i - 6, it would 
be found that two modes appear. However, we may 
say that in general the coarser the grouping the 
greater the tendency to eliminate minor modes. 
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The coarseness of grouping systematically affects 
the height of the curve at the mode, as shown in 
Table IV L herewi th. 

TABLE IV L 

MODAL FREQUENCY WITH VARIOUS GROUPINGS 


Size of 
Interval 

i° 

2° 

3° 

4° 

5° 

6° 

7° 

8° 

9° 

10° 

13° 

20° 

91° 

F requency 
at Mode 

10 

9 

± 

4 

4 

5 f 

4 

5 

4 

Lj. L 
10 

>4 

4 

j21. 


For the purposes of graphic portrayal we should 
follow the principle of grouping coarsely enough 
to suppress such secondary modes as are probably 
due to chance, but not such as are probably evi- 
dence of seconda ry modes existing in the popu- 
lation* In short, we should aim so to group as 
to suppress the idiosyncrasies of the sample 
while preserving the veri ties of the population. 
The precise answer to the question ”how coarse 
to group to accompli sh thi s result? w depends 
upon the nature of the distribution as well as 
the size of the sample. Since many distributions 
are quite similar to the normal distri bution in 
the neighbo rhood of thei r major modes , an answer 
strictly applicable to samples drawn from normal 
populations may be qui te servi ceable for many 
other situations as well, and particularly so if 
concern is as to the location of the maj or mode. 

Since a graphic portrayal that is mi sleading 
even as to the location of the majo r mode wi 1 1 
be more so as to the existence and location of 
minor modes, we surely should adopt a suf fi ci ently 
coarse grouping that there is probably not a mis- 
judgment as to the interval in whi ch the popu- 
lation major mode lies. 

The detailed steps involved in the solution 
of thi s problem and 1 eading to Table IV M are 
given in Chapter XIII, Sections 10 and 11. The 
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reader is advised to increase , generally very 
slightly, the number of classes he employs if his 
data are leptokurtic, i . e. , show frequencies in 
excess of those normally expected in the extreme 
tail regions and at the mode. 

TABLE IV M 

THE NUMBER OF CLASSES TO USE FOR GRAPHIC 
PORTRAYAL OF SAMPLES OF SIZE N DRAWN 
FROM A NORMAL POPULATION 

N No. of Classes 


4—5 

2 

6-8 

3 

9-14- 

4- 

15-21 

5 

22-32 

6 

33-4-6 

7 

117-64 

8 

65-89 

9 

90-1 17 

10 

1 18-153 

II 

1 54— 192 

12 

193-255 

13 

256-3 15 

1 + 


Applying the information given in Table IV M, 
we see that we should employ 8+ classes, to pre- 
sent the distribution of the 62 maximum daily 
temperatures . The range of temperatures is 
(98-65+1) or 34°. Dividing 34 by 8 gives 4.25, 
which to the nearest integer is 4, as the pre- 
ferred size of interval . A plot with this inter- 
val is shown in Chart IV XI. The reader is asked 
to fix firmly in mind that a grouping as coarse 
as this is recommended for graphic portrayal 
only, and that a finer grouping, i=12, as explained 
in Chapter VII , is desirabl e for statistical 
computations leading to measures of central ten- 
dency, of variability, and of correlation. 
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CHART IV XI 

DAILY MAXIMUM TEMPERATURES 

NEW YORK CITY— JULY-AUGUST 1917 



Chart IV XI shows a decided major mode in the 
neighborhood of 82° and a minor mode in the 
neighborhood of 98°. The grouping was such that 
the existence of a major mode in the neighborhood 
of 82° can be trusted, but the class frequencies 
in the neighborhood of 98° are so small that the 
existence of a mode in this region is to be 
doubted. In other words, the grouping employed 
was designed to give a certain confidence with 
reference to the location to within about a half 
an interval of a major mode only. The more pre- 
cise method for locating this mode is given in 
Chapter VII, Section 3. A considerably coarser 
grouping would be necessary before a much smaller 
minor mode found in the sample can be assumed to 
be a feature of the population. 

Frequency polygon of a skewed distribution: 
As an illustration of the tendency for a distri- 
bution of ratios to be skewed, and as an im- 
portant problem in plotting, we will study the 
data of Table IV N, drawn from Mitchell , as quoted 
by Secrist (1917). 
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TABLE IV N 

DISTRIBUTION OF 5578 CASES OF CHANGE IN THE WHOLESALE 
PRICES OF COMMODITIES FROM ONE YEAR TO THE NEXT 


:r cent of change 
ROM THE AVERAGE 
PRICE OF T HE 

preceding year 

l FALLING PRICES) 


54-55.9 

50-51.9 


48-49. 9 
46-47. 9 
44-45.9 


42-43. 9 
40-41.9 
38-39. 9 
36-37.9 
34-35.9 
32-33.9 
30-31.9 
28-29.9 
26-27.9 
24-25.9 
22-23.9 
20-21.9 
18-19.9 
16-17.9 
14-15.9 
12-13.9 
10-1 1.9 
8- 9.9 
6- 7.9 
4- 5.9 
2- 3.9 
UNDER 2 
NO CHANGE 


NUMBER 

OF 

CASES 


I 

I 

I 

1 

2 

4 

5 
5 
7 
10 

7 

16 

27 

17 

32 

39 

45 

71 

76 

107 

120 

173 

200 

238 

329 

375 

405 

697 


PER CENT OF CHANCE 
FROM THE AVERAGE 
PRICE OF THE 
PRECEDING YEAR 
(RISING PRICES) 

NUMBER 

OF 

CASES 

4- 5. 9 

356 

6- 7.9 

261 

8- 9.9 

237 

1 0-1 1.9 

167 

12-13.9 

1 15 

14-15.9 

106 

16-17.9 

102 

18-19.9 

73 

20-21.9 

65 

22-23. 9 

45 

24-25. 9 

47 

26-27.9 

29 

28-29.9 

30 

30-31,9 

22 

32-33. 9 

17 

34-35.9 

18 

36-37.9 

1 1 

38-39. 9 

17 

40-41.9 

14 

42-43. 9 

6 

44-45. 9 

10 

46-47. 9 

1 1 

48-49. 9 

5 

50-51.9 

1 

52-53.9 

4 

54-55.9 

3 

56-57.9 

1 

58-59.9 

6 

60-61.9 

4 


(RISING PRICES) 
UNDER 2 
2- 3.9 


410 

355 
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TABLE IV N (CONTINUED) 

DISTRIBUTION OF 5578 CASES OF CHANGE IN THE WHOLESALE 
PRICES OF COMMODITIES FROM ONE YEAR TO THE NEXT 


PER CENT OF CHANGE 
FROM THE AVERAGE 

PRI CE OF THE 
PRECEDING YEAR 
(RISING PRICES! 

NUMBER 

OF 

CASES 

PER CENT OF CHANGE 
FROM THE AVERAGE 
PRICE OF THE 
PRECEDING YEAR 
(RISING PRICES) 

! NUMBER 

; of 
; CASES 

w— 

k — - 

82-83. 9 

i 

66-67.9 


84-85.9 

i 

68-69.9 

3 

86-87.9 

1 

70-71.9 

S 

— 

— 

72-73.9 

4 

i 00- (01.9 

1 

74-75. 9 

1 

102-103.9 

1 

80-8 1.9 

1 


5,578 


It wi 11 be noticed that the class intervals 
extend over ranges of two units, e. g. , there are 
five class intervals in covering a rise in prices 
from 10 per cent to (but not including) 20 per 
cent. With no direction to the contrary it is to 
be presumed that the class designated in the 
table by "54 - 55.9 " includes all measures with 
values between the limits 53.95 and 55.95; that 
the next class includes measures between 51 .9 5 
and 53.95; etc. This is to say that presumably 
the data have been recorded to but one decimal 
place so that such .measures as 53.86 and 53.92 
are called 53.9 and a measure such as 53.96 is 
recorded as 54. 0. I f the recorder encountered a 
measure 53.95, he had to decide arbitrarily whether 
it would be called 53.9 or 54.0. For the data 
in hand it is not definitely known how such a 
case would have been decided, but we shall assume 
that the preferred and customary procedure was 
followed. 

If the class intervals run in order from 53.95 
to 55.95, 51.95 to 53.95, . . . 1.95 to 3.95, it 
i^i found that the next frequency, in order to 
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extend over the same ran ge, would be from “.05 
to 1.95, i . e. , from an increase in price of .05 
per cent to a decrease of 1.95 per cent. This, 
however, cannot be the case, as a very large 
frequency, 697, is recorded for "no change." The 
way the data are recorded would suggest a class 
interval corresponding to "no change, " but this 
cannot be so as the intervals on either side 
preempt the space. The "no change" cases pre- 
sumably fall at a point and not at variable points 
within an interval. In plotting the data, there- 
fore, the "no change" interval must be squeezed 
out and its frequency, 697, distributed between 
the neighboring classes. We will assign 348 to 
the "under 2— -Falling prices" interval, and the 
remainder, 349, to the "under 2— Rising prices” 
interval . There still is a sli ght discrepancy 
(.05) in the ranges of these two middle intervals, 
but as it cannot be positively accounted for 
without recourse to the original data it is 
passed over. 

For convenience in tabulation and plotting we 
will consider the first class interval to extend 
from 54.00 to 56.00 and to have its mid-point or 
class index 55.00, the second a mid-point at 
53.00, etc., and the frequencies as before. 

The variable in Table IV N has a range of 
values from approximately “55.95 to 103.95 or 
159.90. The number of cases is 5,578. Reference 
to Chapter XIII , Table XIII H, suggests that we 
should have about 33 plotted points, which corre- 
sponds to an interval of 4.8. However, the data 
are very leptokurtic, so that an interval of 
this size will be coarser than would be desirable 
in the neighborhood of the mode and finer than 
desirable in the tail regions. As we are limited 
in our options to 2, 4, 6, etc., we shall choose 
an interval of 4, tabulating the data as shown 
in Table IV 0. 


138 


GRAPHIC METHODS 


TABLE IV 0 


PER CENT 

OF CHANGE 
FROM THE 
AVERAGE 
PRICE OF 
THE PRE- 
CEDING 

YEAR 


CLASS INTERVA 
OF 

4 PER CENT 

l PER CENT 
D F CHANGE 
FROM THE 
AVERAGE 

NUMBER 

OF 

CASES 

CLASS INTERVAL 
OF 

H PER CENT 


1 

T NUMB EF 
OF 

CASES 

PRICE OF 
THE PRE- 
CEDING 
YEAR 

PER CEN 
OF 

CHANGE 

T NUMBER 
OF 

CASES 


H 

- 56 

1 

- 13 

120 



- 55 

1 





- 12 

293 

— 

1 



- II 

173 





- 52 

1 

- 9 

200 



- 51 

i 





- 8 

438 

- 49 

i 



- 7 

238 





- 48 

2 

- 5 

329 



- 47 

i 





- 4 

704 

* 45 

2 



- 3 

375 





- 44 

6 

- 1 

753 



- 43 

4 





0 

1512 

- 41 

5 



1 

759 





- 40 

10 

3 

355 



- 39 

5 





4 

711 

- 37 

7 



5 

356 





- 36 

17 

7 

261 



- 35 

10 





8 

498 

- 33 

7 



9 

237 





- 32 

23 

i 1 

167 



- 31 

16 





12 

282 

- 29 

27 



13 

115 





- 28 

44 

15 

106 



- 27 

17 





16 

208 

- 25 

32 



17 

102 





- 24 

7! 

19 

73 



- 23 

39 





20 

138 

- 21 

45 



21 

65 





- 20 

1 16 

23 

45 



- 19 

7 S 





24 

92 

- 17 

76 



25 

47 





- 16 

183 

27 

29 



- 15 

107 





28 

59 
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TABLE IV 0 (CONTINUED) 


PER CENT 
OF CHANGE 
FROM THE 
AVERAGE 
PRICE OF 
THE PRE- 
CEDING 
year 

NUMBER 

OF 

CASES 

CLASS INTERVAL 
OF 

4 PER CENT 

PER CENT 
OF CHANGE 
FROM THE 
AVERAGE 
PRICE OF 
THE PRE- 
CEDING' 
YEAR 

| 

CLASS INTERVAL 
OF 

4 PER CENT 

PER CENT 
OF 

CHANGE 

NUMBER 

OF 

CASES 


NUMBER 

OF 

CASES 

29 

30 





64 

0 

3! 

22 



67 

4 





32 

39 



68 

7 

33 

17 



69 

3 



35 

13 



71 

1 





36 

29 



72 

5 

37 

1 1 



73 

4 

« 


39 

17 



75 

I 






3 S 


— 



41 

14 

1 




76 

1 

43 

8 

■■ 




80 

i 




S 6 

81 

1 



45 

SO 



83 

I 



47 

i I 





84 

2 



48 

16 

85 

s 



49 

5 



87 

1 



51 

* 





88 

1 



52 

5 

— 


. 


S3 

4 





92 

0 

55 

3 





96 

0 



56 

4 



too 

1 

57 

S 



SOI 

1 



59 

6 . 



103 

1 





30 

10 



104 

i 

61 

4 








— 




5,578 


5,578 


The frequency polygon , Chart IV XII, seems 
better suited to the data in hand, as it gives 
the impression of a more pronounced mode than 
would a histogram and in this case, judging by 
the number of zero change items , this feature 
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should not be suppressed. 


CHART IV XII 


CHANGE IN 5578 WHOLESALE PRICES 
FROM ONE YEAR TO NEXT 



Five way $ of drawing a graoh: There are 
three common ways of making a graph: (a) by 
erecting rectangles upon the appropriate base 
intervals; ( b ) by connecting plotted points by . 
means of straight lines; and (c) by drawing a 
smooth curve through or near all the points which 
fits the data as nearly as can be determined 
visually. For quantitative data (a) and (b) 
result in the histogram and frequency polygon 
respectively. A fourth though less common way 
(of) is to plot from smoothed data; and a fifth 
(e) is the advanced method of mathematically 
determining the equation of the curve which best 
fits the data and plotting the same . Methods 
(a), (b) t and ( e ) , and usually (d) , preserve 
areas, i . e . , the total area under the curve is 
equal to the number of cases in the sample. 
Method (e) also preserves other important features. 
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In using method ( c) there should be a definite 
attempt to preserve areas; that is, if the curve 
as drawn lies above any point it should lie below 
some other, or, more accurately, the sum of the 
vertical distances which it lies above points in 
the actual distribution should equal the sum of 
the distances which it lies below other points. 
In drawing a free curve for such a distribution 
as that of incomes in the U.S. , the preservation 
of the total area is difficult to insure, but for 
maximum temperatures, Table IV J and Chart IV X, 
it can be accomplished with fair accuracy and 
little trouble. Hie personal element which enters 
into method ( c) generally makes it inadvisable 
for published work; but for original, hasty and 
personal research it may well be used frequently. 

The cumulative frequency curve, or ogive, or 
percentile graph; When the frequencies in the 
successive classes of an ordinary unimodal fre- 
quency distribution are cumulated, interval by 
interval, beginning at either end, there result 
frequencies having values of the variate less 
than, or greater than, the successive limits of 
the intervals of the frequency distribution. The 
curve plotted from these is variously called a 
"cumulative frequency graph, " an "ogive, " or, in 
the important case where percentage frequencies 
are used, a "percentile graph, " or "percentile 
curve. " 

The temperature data, as given in the first 
two columns of Table IV J, may be used to illus- 
trate such a graph. From the column of Table 
IV J giving the finest grouping available, we 
immediately obtain Table IV P. 

The student should carefully note the exact 
boundary values which apply to the values given 
in column 3, Table IV J states that the maximum 
temperature for the coldest day was 65°. As the 
grouping interval is 1° this asserts that for 
this day the maximum temperature was greater 
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TABLE IV P 


CUMULATIVE FREQUENCIES OF DAILY MAXIMUM TEMPERATURES 
New York City, July-August, 1917 

NO. OF OATS [ PROPORTION OF 

HAVING LOWER DAYS HAVING 

TEMPERATURES MAXIMUM LOWER MAXIMUM 

TEMPERATURES TEMPERATURES 
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than 64. 5 U and less than 65.5°. We therefore can 
assert, as is done in column two of Table IV P, 
that there was one day with a maximum temperature 
less than 65.5 . Of course it is incorrect to 
assert that one day had a maximum temperature 
less than 65 . Unfortunately this type of error 
is common and the student is advised to compute 
points on published percentile charts, if the 
issue is important and if the basic data are given 
so that such computation is possible. 

It is also to be noted that the information 
given by the fourth row, that "2 days had a maxi- 
mum temperature less than 66.5," is just as true 
and nei ther more nor less true than that given by 
the fifth, sixth, or seventh rows, that n 2 days 
had a maximum temperature less than 67.5, 68.5, 
and 69.5, " respectively. Since these are al 1 
equally sound we can secure a certain desirable 
smoothing and a simplici ty of statement if, prior 
to plotting, we replace these four statements by 
the single statement, "2 days had a maximum 
temperature less than 68.0," the 68. 0 being the 
average of 66. 5 and 69.5. This is recommended 
procedure wherever multiple statements occur, 
except in the case of statements referring to 
terminal situations. 

At the lower end we have "zero days had a maxi- 
mum temperature less than 64.5." Also "less than 
63.5." Al so "less than 62. 5. " Etc. , without 
limit. Since the series 64. 5, 63. 5, 62. 5, etc. , 
continues wi thou t limit, the average of the 
smallest and the largest is indefinite. Hie re fore 
we will not plot it and we will not assume that we 
have any trustworthy information from the sample, 
of what is the temperature such that in the popu- 
lation "zero days will have a less maximum temper- 
ature. " In other words, the zero percentile (also 
the 100) is completely indeterminate from any in- 
formation given by the sample. We accordingly 
will not plot the zero percentile (nor the 100 
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percentile). In fact, the lowest percentile to be 
plotted is that given by the proportion of cases in 
the smallest interval having one ormore cases in 
it. This proportion, in this problem, is 1/62, or 
.016, so that the lowest computable percentage is 
the 1.6 percentile. The highest computable per- 
centile is established by one minus the proportion 
of cases in the highest interval having one or 
more cases in it. In this problem this proportion 
is 1-2/62, or .968), so that 96.8 is the highest 
computable percentile. 

Another way of expressing these facts is to 
say that from this sample of 62 days percentiles 
less than 1.6 and greater than 96.8 have infinite 
standard errors, while percentiles between these 
two values have finite standard errors, though as 
will be shown later the presumptive errors in the 
computed percentiles near 1. 6 and 96. 8 wi 11 be 
very large. 

If we smooth as suggested we obtain the fol- 
lowing table: 


TABLE IV 0 


CUMULATIVE FREQUENCIES OF DAILY MAXIMUM TEMPERATURES 
NEW YORK CITY, JULY-AUGUST, 1917 



Proportion of 


Proportion of 

Temperatures 

days having 
lower max I mum 

Temperatu res 

days having 
lower maximum 


tempe ratu res 


tempe ratur es 

? 

,000 

8 1*5 

.532 

65.5 

.016 

82.5 

.613 

68*0 

.032 

83-5 

. 726 

70*5 

.048 

84*5 

.758 

72*5 

.065 

85.5 

.823 

74,5 

.097 

86.5 

.871 

75.5 

.145 

87.5 

.887 

76*5 

*161 

89.0 

.919 

77.5 

*171 

92.5 

.935 

78*5 

.226 

95.5 

.952 

79*5 

. 242 

97.0 

.968 

80*5 

*403 

7 

1.000 
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The plot of these data is shown by the dot 
line curve of Chart IV XIII. Special care is 
called for in the labeling of the axes. The terms 
"ogive" and "percentile" are technical terms and 
somewhat undesirable for use in the title to the 
Chart. They are quite unsatisfactory in con- 
nection with axes labels. 

CHART IV XIII 

DAILY MAXIMUM TEMPERATURES 


NEW YORK CITY— JULY-AUGUST 1917 



PERCENTAGE OF DAYS HAVING LOWER MAXIMUM TEMPERATURES 

The values for the chart as shown have been 
laborious to compute, as there are many of them, 
and annoying to plot, as they are not integral 
values. It is accordingly economical and not 
misleading to calculate a small number of se- 
lected percentiles and plot these values only. 
We have earlier noted that for a sample of 62 
eight classes are sufficient for graphic por- 
trayal. Some number of computed percentiles, say 
not less than eight, should suffice for a satis- 
factory ogive. Let us therefore compute the 
following percentiles: 2, 5, 10, 20, 35, 50, 65, 
80, 90, 95, 96, and plot the ogive from these. 
The values 2 and 96 have been chosen because they 
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are the smallest and the largest integral per- 
centiles calculable from the data. 

The comput at i on of percentiles: The compu- 
tation follows formula [4:01] which is simple to 
use and is sel f-explanatory as soon as the meaning 
of the symbols is grasped. 

Y = the number of cases in the sample. 

P p - the percentile in question (if p = . 10 
then P 10 “ the tenth percentile). 

p = the proportion of cases having smaller 
values than P . 

pY = the number of cases having smaller values 
than P . If exactly pY cases lie below 
some class limit, that limit = P . (Note, 
however, P 0 5 example following.) 

Determine the class wherein the pY measure 
lies and let 

v = the value of the lower limit of this 
class. 

/ = the frequency, or number of cases, in 

this class. 

i p - the interval, or range, covered by this 
class. 

F = the sum of the frequencies in all classes 
below (i.e. , classes with smaller X val- 
ues than) this class. 

Then 

pN ~ F p . 

V P + ^ i p Value of a percentile [4:01] 

P 

This formula is convenient for finding an assigned 
percentile. 
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A not infrequent problem is to find the per- 
centile value of an assigned score. Let this 
assigned value be P p . Then every term in [4:01] 
except p is known and this may be solved for, thus: 


P ~ 


F + 

p 


f (P - v ) 

p V p p s 


N 


Proportion of cases 
falling b e 1 ow 


[4:02] 


Example (a) involving no complications: com- 
pute the 10th percentile for the temperature data 
of column "i = 1°" of Table IV J and Table IV P. 
The computation could also be based upon the 
frequencies in column "i = 2°" with satisfactory 
results, for the number of classes is then still 
greater than 12, but still coarser grouping 
should be avoided. 


P = .10 

pN ~ 6.2 

We observe that the 6.2 measure lies in the 
class with class index 75.0 and that 


v = 74.5 
f P 3 

'p 1 

F p 6 

Accordingly 



74.5 


.10(62) - 6 
3 


1 = 74.57 


Example (b) wherein the class containing the pN 
measure is neighboring to one or more classes 
with zero frequency: Compute P < 0 5 . 
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P = .05 


pN = 3.1 


We observe that the 3.1 measure lies in the 
class in Table IV J with class index 71 and that 
the two neighboring classes above have zero 
frequency. We therefore consider the frequency, 
1, found in this class to extend over a stretch 
equal to that of this class plus half that of 
the neighboring zero classes,, so that 


v =70.5 and the upper class limit - 72.5 
giving 



F p =3 


Thus 


P 05 = 70.5 + 


.05(62) - 3 


70.70 


Having the computed values for percentiles 
conveniently spaced from the smallest, P 02 , to 
the largest P ^ , the percentile curve may qui ckly 
be plotted, as shown by the full line curve of 
Chart IV XIII. Compa ri son with the dot line 
curve shows that this curve based upon but 11 
values, all integral, is entirely adequate. 

The issue of the reliability of percentiles 
could be appropriately studied after that of 
variability, as given in Chapter VI, but to 
impress the inescapable intimacy between a sta- 
tistic and its error the standard error of a 
percentile is introduced at this point. The 
beginning student is not expected to grasp ful ly 
the meaning of a standard error at this point. 
After studying Chapter VI he should re-read 
this section. 
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In calculating the standard error of a per- 
centile itis desirable to use an interval as 
large as that recommended for graphic portrayal 
and to make the center of the interval coincide 
as nearly as possible with the percentile in 
question. We therefore define two new constants, 

i' and 
p p 

i p ' - the interval recommended for graphic 
portrayal. 

f ' = the frequency in the i p ‘ interval having 
a class index as nearly equal to P p as 
the original grouping (i. e. , the data as 
first recorded prior to any grouping) of 
the data permits. 

We also let q ~ 1 ~ p 

Then, 


ip ^~Npq~ 

— * Standard error of a 

/ percenti le* 1 4 t 03] 


In the derivation of this, as of all other 
standard error formulas, the elements in the 
right-hand member call for population, or true, 
values. Since these are unavailable, necessity 
dictates that sample values be substituted, with 
the result that the standard error computed does 
itself have a standard error or element of un- 
certainty, but as this usually attaches to more 
remote figures than the first or second, the 
computed value remains a highly important sta- 
tistic. 

Derived in Kelley, 1923, Formula [4-2]. 


150 


GRAPHIC METHODS 


- [4:03] 


section *. QUALITATIVE SERIES 

The qualitative or categorical is the most 
primitive of all statistical series. It conveys 
the minimum of information about the data, in- 
forming one simply that there are different 
classes and different numbers of cases or differ- 
ent values attached to the different classes. 
If certain temporal, geographic, or quantitative 
information inherent in series of these types is 
neglected in studying the data of these types, 
the series degenerate into qualitative series, 
for there still remain different categories and 
attached amounts or frequencies. 

There are two important sub-types to quali- 
tative data: (1) in which random sampling of a 
population has resulted in observed frequencies 
in a number of categorical classes, and (2) in 
which an observation of some specific sort taken 
upon a number of categories shows different 
values, not numbers of cases, attached to the 
respective classes. The second column of Table 
IV R (hypothetical) is of the first type, while 
the third column is of the second. The dis- 
tinction between these sub-types is important in 
TABLE IV R 

VOCATIONAL DISTRIBUTION IN N0T0WN OF WORKERS AND OF 
CAPITAL LOCALLY INVESTED IN CONNECTION WITH 


THEIR B 

US 1 N ESSES OR EMPLOYMENTS 


Number of 

Cap i t a 1 

Vocations 

Workers 


Farming 

80 

$180,000 

Mining 

7 

35,000 

T ransportation 

8 

10,000 

Manufacturing 

M-5 

1 10,000 

Sel 1 ing 

20 

40,000 

Personal Service 

10 

5,000 

Other 

30 

20.000 


200 
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connection with computational procedures, for the 
reliability of the frequency in classes obtained 
as a result of random sampling is known (see 
[9:02]) but the reliability of amounts is not 
ascertainable without more information than is 
commonly available. The reason for this is that 
each case, — worker employed, —is presumably an 
independent observation resulting from a process 
of enumeration, but each amount, — dollar of capi- 
tal employed — is presumably not an observation 
which is independent of every other amount. We 
may not reasonably look upon the $400,000 as a 
result of 400,000 measures chosen at random from 
an infinite population. I f there exists Some town, 
which upon acquaintance i s judged to be generally 
similar to Notown and having 200 workers, we' 
may expect to find, within limits suggested by 


(- 5.9), the.standard deviation of 


/ 


00X45X155 

200X200 


the frequency in a class, that the number of 
workers engaged in manufacturing will be 45, but 
we should not expect to find, within limits sug- 


, , /400, 000X110,000X290,000 , 

gested by / — 2 : (= 282), 

* V 400,000X400,000 


that the amount of capi tal employed in manu- 
facturing will be $100,000. This difference is 
of great importance in statistical deduction. 
It is even important in the reading of a graph. 

Customary devices for representing categori- 
cal data are the bar graph. Chart IV XIV, the 
segmented bar , the circl e graph or pie chart 
which is a circle with sectors , Chart IV XV, and 
pictograms representing both the number of cases 
in and the nature of categories, as, e.g. , would 
be done i f 80 farmers are represented by, say, 
16 pictures of men with hoes, and seven miners 
by 1.4 men of the same si ze as the farmers but 
with miners’ caps and lamps, etc. The first three 
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devices require descriptive lables, but the picto- 
gram is serviceable for non-readers or others more 
inclined to the comics than the editorial page. 

CHART IV XIV 

WORKERS AND CAPITAL IN N0T0WN 

NUMBER OF WORKERS CAPITAL EMPLOYED (IN *1000) 

FARMING 
MANUFACTURING 
SELLING 

PERSONAL SERVICE 
TRANSPORTATION 
MINING 
OTHER 

0 60 120 180 

CHART IV XV 

DISTRIBUTION OF CAPITAL IN NOTOWN 
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section 5 . GEOGRAPHIC SERIES 

The feature of the spatial series not shared 
by quantitative, qualitative, or temporal series 
is that the data are tied to a position in space 
which is important for the issues in question 
and must not be neglected . If neglected the 
series ceases to be spatial and becomes, gener- 
ally, a quantitative series. This is commonly 
the case when names instead of geographic po- 
sitions are given in connection with products of 
districts, e.g. , if the 1941 chicken production 
for the counties of California are given in a 
table naming the counties and the number of 
chickens produced, the reader who does not have 
a map of California, showing counties, before 
him or in his mind 7 s eye is presented with nothing 
but a quantitative series. If this information 
is presented on a map it is a geographic series, 
which is the most common sub-type under spatial 
series. Just as the temporal series for the 
daily maximum temperatures in July and August in 
New York City ceased to be temporal when the 
temperatures were divorced from the particular 
dates with which originally connected, so a geo- 
graphic series ceases to be such when the items 
are so presented that the reader does not attach 
them to geographic position. For the existence 
of a spatial series a spatial metric, — one, two, 
or three dimensions as the case may be, — must be 
employed. 

The one- and the three-dimensional spatial 
series are unusual but very interesting when 
relevant. A survey giving for New York City the 
amounts of dust in the air per cubic foot for 
different spatial positions, each having a north- 
south, an east-west, and a distance-above-street- 
level dimension, is an illustration of a three- 
dimensional spatial series. Its representation 
requires a three-dimensional model (or a stereo- 
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scopic picture) with the dust density records, 
by density of a swarm of points, by figures, by 
color, or otherwise, attached to their respective 
spatial positions. Numerous illustrati ons can 
be drawn involving sea-life populations and their 
density in different locations in the ocean. If 
we add seasons of the year such a series becomes 
four -dimensional , with time as the fourth di- 
mension. 

The one-dimensional spatial series is un- 
common. A precise illustration of such is to be 
found in a chromosome map. The location of genes 
m a chromosome i s strictly linear. To present 
such a series we require a line, straight or 
curved provided it does not cross itself, with 
t e location of the genes located thereon. In 
this simple case, if the line with the located 
genes is not given, but only the names of the 
genes, the spatial series disappears and we have 
nothing but a qualitative series left. 

en we note that debased spatial, quanti- 
a ive, an temporal series reappear as quali- 
a ive series, we are led to ask if qualitative 

<=fr^p S are not recognized as degenerate 

series of the other types would not in fact be 

n t0 * SUGh ^Uer information were availa- 

ti: Jit m 7- 3lS0 nofe in th ^ connection that 

. - 1S I ? S ’ deriving from frequencies and 
proportions m d asses> appropriate> to quaii _ 

nature, Thought the m ° re primitive in their 
■ . , . f. , the same time more universal 

serve /, aPP ; C f iiity ' than the statistics that 

till and r ltat / VG Series * Both because of 

is* commnn^K 1186 ^ 0 ^ ^e i° ss °f information which 

or rr ^ ta are merel V thrown in t° cate- 

tative' data a U j ent s h°uld he content with quali- 

cannotbecast^nt^n^^rT 117 Ca ?® the data 
It is somewhat diff ^ i ^ typ6S ' 

or as t Hrpp ri -i difficult to classify as two- 

me nsional data distributed over 
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the surface of the globe. Since the data all 
lie upon the surface a much more limited type 
exists than the general three-dimensional spatial 
series. Certainly if we are concerned with so 
small a portion of the surface of the sphere that 
it can approximately be represented by a plane, 
we have a two-dimensional spatial series. This 
is the most common situation and we shall think 
of geographic series as essentially two-di- 
mensional spatial series. 

In dealing with data attached to the forty- 
eight states of the United States we should 
follow the tabulating practice of the U. S. 
Census . This is shown in the stub of Table IV S. 
If we desire to present the population of the 
states as a geographic series we must represent 
the population figures upon a map in their ap- 
propriate positions , by writing the population 
figures , by varied cross-hatching, by shading, 
or by stippling. In these last three cases the 
greater the density of population the heavier 
the hatching, shading, or stippling. Shading or 
stippling generally yield the more accurate 
visual impression. 

The population data just mentioned may be 
called a quantitative geographical series because 
different quantities of a single thing are at- 
tached to the different geographical positions. 
Should the chief products of the states be shown 
on a map, — a steer being drawn on Texas, a shock 
of wheat on Kansas, etc. , we have a qualitative 
geographic series . This latter is very common 
in popular portrayal and is serviceable if there 
is no attempt to convey other than qualitative 
information. Of course the veriest tyro is aware 
that such a presentation fails to reveal im- 
portant quantitative aspects that exist. 

The most primitive geographic series merely 
shows that different regions exist. Such regions 
are defined by the boundaries given on the map. 
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TABLE IV S 


CERTAIN POPULATION, AREA, AND SCHOOL ATTENDANCE 
STATISTICS OF THE UNITED STATES, BY STATES, 1940* 


DIVISION AND STATE 

m 


PERSONS OF AGE 16-20 

TOTAL 
NO. IN 

io'oo »S 

ATTENDING 
SCHOOL 
( IN 1000 's; 

PER 

, CENT 

UNITED STATES 

131,659 

2, 977, 128 

12,278 

5, 106 

41.5 

HEW ENGLAND 

8,437 

63,206 

761 

329 

43.2 

Maine 

847 

31,040 

76 

31 

41.2 

New Hampshire 

492 

9,024 

43 

17 

41.0 

Vermont 

359 

9,278 

32 

13 

41.1 

Massachusetts 

4,317 

7,907 

384 

175 

45.6 

Rhode Island 

713 

1,058 

68 

24 

35.1 

Connecticut 

1,709 

4,899 

157 

67 

42.8 

MIDDLE ATLANTIC 

27,539 

100,496 

2,475 

1,089 

44.0 

New York 

13,479 

47, 929 

1, 135 

526 

46.3 

New Jersey 

4, 160 

7,522 

379 

155 

40.8 

Pennsylvania 

9, 900 

45, 045 

961 

409 

42.6 

EAST NORTH CENTRAL 

26,626 

245,011 

2,357 

1,047 

44.4 

Ohio 

6,908 

41,122 

822 

288 

46.3 

Indiana 

3,428 

36,205 

308 

134 

43.5 

Illinois 

7,897 

55, 947 

677 

291 

43.0 

Michigan 

5, 256 

57,022 

471 

203 

43.0 

Wisconsin 

3, 138 

54,715 

280 

131 

47.0 

WEST NORTH CENTRAL 

13,517 

510,621 

1,235 

533 

43.2 

Minnesota 

2,792 

80, 009 

256 

113 

44.3 

Iowa 

2,538 

55,986 

230 

98 

42.7 

Missouri 

3,785 

69,270 

331 

126 

37.9 

North Dakota 

642 

70,054 

65 

28 

43.2 

South Dakota 

643 

76,535 

54 

30 

46.6 

Nebraska 

1,316 

76, 653 

123 

56 

45.2 

Kansas 

1,801 

82, 1 13 

166 

83 

50.0 

SOUTH ATLANTIC 

17, 823 

268,431 

1,851 

619 

33.4 

Del aware 

267 

1,978 

24 

9 

38.3 

Maryland 

1,821 

9,887 

169 

56 

32.9 

Dist. of Columbia 

653 

61 

51 

24 

46.9 

Virginia 

2, 678 

39, 899 

280 

93 

33. 1 


I 
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TABLE IV S (CONTINUED) 

CERTAIN POPULATION, AREA, AND SCHOOL ATTENDANCE 
STATISTICS OF THE UNITED STATES, BY STATES, 1940* 


DIVISION AND STATE 

P 0 P U LA- 

UPF A 1 Nl 

PERSONS OF AGE 1 

T i ON 

IN 

1000 * s 

SQUARE 

M 1 LES 

TOTAL 
NO. IN 
1000’S 

ATTEND ING 
SCHOOt 
( IN 1000'S 

(CONTINUED) 





SOUTH ATLANTIC 





West Virginia 

1,902 

24,090 

200 

73 

North Carol ina 

3,572 

4-9, m 

401 

132 

South Carol ina 

1,900 

30,594 

224 

59 

Georgia 

3, 124 

58,518 

330 

99 

Florida 

1 , 897 

54,262 

170 

65 

EAST SOUTH CENTRAL 

10,778 

180,568 

1, 106 

374 

Kent ucky 

2,846 

40, 109 

287 

8! 

Tennessee 

2,916 

41,961 

293 

100 

Alabama 

2,833 

51,078 

297 

107 

Mississippi 

2,184 

47,420 

228 

87 

WEST SOUTH CENTRAL 

13, 065 

430, 829 

1,305 

505 

Arkansas 

1,949 

52, 725 

201 

72 

Lou isiana 

2,364 

45, 177 

239 

82 

Oklahoma 

2.336 

59.283 

236 

109 


Texas 


263,644 

MOUNTAIN 

4,150 

857,836 

Montana 

559 

146,316 


Colorado 
New Mexico 
Arizona 


PACIFIC 

Washington 


Cal i torn i a 



* Abstract of the Sixteenth Census of the U.S., 19 4-0, p.4-5 
and 76, Population, Second Series, Characteristics of the 
Population, United States Summary. 
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To make these regions more easily observed they 
may be colored differently. It is not necessary 
to employ as many colors as there are regions, 
for non-contiguous regions may be colored the 
same without confusion. Experience, rather than 
mathematical deduction, shows that if the region 
to be mapped is the globe or any portion thereof 
four colors will always suffice to meet the re- 
quirement that no two contiguous regions shall 
be colored the same. In any actual case it may 
call for trial and retrial to find the proper 
order of coloring to insure that this minimal 
number four suffices. 

Frequently the map with the recorded phenomena 
upon it constitutes the end product , or the 
terminal geographic statistic. However , compu- 
tational techniques are appropriate to the hand- 
ling of important issues that commonly arise. 
When the nearly waste land that is now Gary, 
Indiana, was established as a steel center , it 
was because of its strategic geographic location. 
To be considered was the cost of bringing ore 
from Superior, coal from southern Illinois , and 
of distributing finished products to railways 
and industrial plants in the Middle West and 
beyond. Taking these and other necessary con- 
siderations, there existed some location that 
would minimize costs and maximize profits . Gary, 
after a careful survey, was j udged to be it. The 
problems of geographic series may become very 
complex and they arise in many business and social 
intercourse situations. 

The school superintendent recommending the 
building of a new school should do so primarily 
to minimize the distance and the hazards that 
pupils are subjected to in going from home to 
school. Of the available locations , undoubtedly 
some one better accompl ishes these things than 
any other . The basic treatment of this problem 
requires first of all a plotting of a school 
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population map, as estimated to exist at a future 
date, corresponding to one-half the life of the 
school building to be constructed. Upon this map 
lines may be drawn representing the hazards, e. g. , 
more lines for railway crossings and boulevards 
than for residential streets. Then each site may 
be tested by making computations giving the sum 
of pupil distances from school and the sum of 
hazards crossed. If some such combining concepts 
as "one-half mile of distance is equally disad- 
vantageous to crossing one boulevard" are made, 
distance debits and hazard debits may be combined 
and a single figure obtained representing the 
total debit to be attached to a site. A com- 
parison of sites by this means will disclose that 
site having minimal distance and hazard debits. 

The procedure suggested is closely related to 
that of determining the center of population. 
Herewith is a brief table, taken from Read (1939), 
of centers in the United States of certain popu- 
lations. 


TABLE IV T 

THE CENTERS OF POPULATION OF CERTAIN LEARNED 
GROUPS IN THE UNITED STATES 


ORGANIZATION 

NO. OF 
INDI- 
VIDUALS 

CENTER OF 
POPULATION 

NEAREST 

LARGE 

CITY 

■flM 

■sm 

nsffmi 

mm 

Col 1 ege enrol Iment 

8 85,282 

39° 1 9' 

84°45 ' 

C inci nnat i 

Am. Chem. Soc. by sections 

17, 469| 

40° 5' 

83°49' 

Dayton 

Am. Historical Society 

2, 92 d 

39°54' 

82° 13' 

Col umbus 

Math. Assn, of America 

1,928 

39°32' 

84° 57' 

Cincinnati 

Am. Assn, of Petroleum 




Ok 1 ahoma 

Geologists 

2,4it8 

35° 6' 

98° 12' 

City 

Am. Psychological Assn. 

2,269 

40° 15' 

83°35' 

Col umbus 

Am. Assn, of University 




Cincinnati- 

Professo rs 

11,165 

E 

84°27' 

Dayton 

Am. Assn, for the Ad- 





vancement of Sc i ence 

17. 141 

39°4 1 ' 

84°!0' 

Dayton 
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The method of calculation of such centers fol- 
lowed, with minor modification, the general pro- 
cedure of the United States Census in its compu- 
tation of the center of the United States popu- 
lation, as described in full on page 20, Volume 
II, of the 15th Census of the U. S. , and as here 
quoted in abridged form: 

"The center of population may be said to 
represent the center of gravity of the population. 
If the surface of the United States be considered 
as a rigid level plane without weight and the 
population distributed thereon, all individuals 
being assumed to have equal weight, the point on 
which this plane would balance would be the 
center of population. . . . 

"In making the computations for the location 
of the center of population it is necessary to 
assume that the center is at a certain point. 
Through this point a parallel and a meridian are 
drawn, crossing the entire country. . . . 

"The product of the population of a given 
area by its distance from the assumed parallel 
is called a north or south moment, and the prod- 
uct of the population of the area by its distance 
from the assumed meridian is called an east or 
west moment. . . . 

"The population of each square degree north 
and south of the assumed parallel is multiplied 
by the distance of its center from that parallel; 
a similar calculation is made for the principal 
cities; and the sum of the north moments and the 
sum of the south moments are ascertained. The 
difference between these two sums, divided by 
the total population of the country , gives a 
correction to the latitude. In a similar manner 
the sum of the east and of the west moments are 
ascertained and from them the correction in 
longitude is made." 
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As a simple center of population problem let 
us consider the following, which we will call 
the Problem of the Statisticians , Picnic: Twenty 
Massachusetts statisticians with residences as 
given in Table IV U decide to have a picnic and, 
without prior determination, agree that the site 
shall be the center of population of the twenty. 

TABLE IV U 


PICNICKING ST AT ISTICIANS OF MASSACHUSETTS 


NAME OF 
STATISTIC! AN 

PLACE OF 
RESIDENCE 

NAME OF 
STATISTIC! AN 

PLACE OF 
RESIDENCE 

Adams 

N. Adams 

King 

Gloucester 

Pitt 

Pittsfield 

Bridge 

Cambridge 

Barr 

Great Barrington 

Bean 

Boston 

Spring 

Springfield 

Fern 

Boston 

Green 

Greenfield 

ivy 

Boston 

Woo 

Worcester 

Rock 

PI ymouth 

Fitch 

Fitchburg 

Town 

Prov i ncetown 

Ames 

Amesbury 

Ford 

New Bedford 

Lawrence 

Lawrence 

Wood 

Woods Hole 

Lowel 1 

Lowel 1 

Wei 1 s 

Well esl ey 


Where is this Mecca? We solve this problem by 
taking measurements on a map of Massachusetts, 

CHART IV XVI 
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Chart IV XVI. We choose some arbitrary point, 
say Wellesley, somewhere near the center of popu- 
lation, and draw a meridian and a parallel through 
it. Assuming a flat surface and using the scale 
of the map and a pair of calipers, we determine 
the north-south and the east-west coordinates of 
each of the twenty residences. The figures of 
Table IV V result. 


TABLE IV V 

NORTH "SOUTH AND EAST-WEST COORDINATES 
OF RESIDENCES FROM WELLESLEY 



N-S 

E-W 

(N-S 

( E-W 

NAME 

DISTANCE 

DISTANCE 

9 


IN MILES 

IN MILES 

DISTANCE) 

D ! STANCE) 2 

Adams 

28 

- 93 

784 

8549 

Pitt 

10 

-s 00 

100 

10000 

Barr 

- 7 

-106 

49 

11236 

Spring 

-14 

- 66 

196 

4356 

Green 

20 

- 67? 

400 

45564 

Woo 

- 3 

- 25 

9 

676 

Fitch 

19 

- 255 

361 

650-q; 

Ames 

38 

22 

1444 

484 

Lawrence 

28 

ik 

784 

55£ 

Lowe 1 1 

23 

\ 

~ 2 

529 

1 

4- 

King 

22 

32 

484 

1024 

Bridge 

4 

95 

16 

904 

Bean 

3i 

Il5 

12* 

1324 

Fern 

3i 

Il5 

124 

1324 

Ivy 

35 

115 

124 

1324 

Rock 

-235 

31 

5524 

961 

Town 

-I65 

55 

2724 

3136 

Ford 

-44 

19 

1936 

361 

Wood 

-535 

32 

28624 

1024 

Wells 

0 

0 

0 

0 

Sums 

41 

-241 

108154 

47657 

Means 

2.05 

- 12.05 

540.775 

2382.85 
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/ 540.775 - (2. 05) 2 + 2382.85 - (12.05) 2 = 

/ 2774.22 = 52.67 

Computing sums and means we find that the 
center of population of these twenty is 2.05 
miles north and 12.05 miles west of Wellesley. 
Had the faith of the statisticians in their sub- 
ject been wavering, it would have been revived 
as they found their rendezvous, — a beautiful 
little island in the middle of a fine body of 
water. 

The center of population defined in line with 
the computations made has the important property 
of minimizing the sum of the squared distances, 
or the quadratic mean distance. For the picnic 
problem we find this to be 

/ 540.775 - (2.05) 2 + 2382.85 - (12.~05) 2 , or 
52.67 miles. 

Should the desire be to minimize the mean, or 
the median, or the modal distance, other defi- 
nitions and other computational procedures would 
be necessary. Concerning the modal distance we 
may note that if the place of greatest density 
is chosen the number zero distance from the locus 
is maximized, but if the frequency of cases at 
distance x (x / 0) is to be maximized, some point 
(or points) other than this place of densest 
population will in general constitute the so- 
lution. 

If the geographic phenomena studied are con- 
tinuous and show moderate rates of change over 
the region studied, they can be excellently 
portrayed by means of contour lines, as is done 
in isothermal and isobaric weather maps and in 
Coast and Geodetic Survey maps giving land ele- 
vations and sea depths. Military maps of terrain 
illustrate a most precise portrayal of a geo- 
graphic series. 
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Chart IV XVII taken, with permission, from 
Birth Control Review, Vol. 17, no. 4, attempts 
to depict the status of sterilization laws in the 
United States in 1933. It is not entirely satis- 
factory for the states shown in black do not 
constitute a uniform class. There is, of course, 
wide diversity in the statutes in these several 
states. 


CHART IV XVII 

STERILIZATION LEGISLATION IN THE UNITED STATES 



The mapping on a two-dimensional sheet of so 
large a portion of the surface of the globe that 
it cannot be represented satisfactorily by a 
°^ ows different practices depending upon 

3 ® ature > °r features, it is most important 

to portray correctly. 

The Mercator projection of the surface of the 

tained by , circumscribing the globe 

!JI h “ tbin paper blinder which touches it at 

onto Z pr ? e ? tM « the points ° n the globe 

onto the outside of the cylinder, by using lines 
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that pass through the center of the globe. This 
projection has the important property of pre- 
serving all compass directions, as is important 
to the navigator, but areas and distances are not 
preserved and are, in the arctic regions, greatly 
exaggerated. 

An orthographic projection is obtained by pro- 
jecting the points of the surface of one-half the 
globe onto a plane which touches the globe at one 
point (the center of the map), — the projecting 
lines being perpendicular to the plane. Two such 
circle maps are commonly found in an elementary 
geography to picture the entire global surface, 
but distances and areas are greatly distorted, 
except in the neighborhoods of the two centers. 

Another mapping designed to serve a single 
spot is an azimuthal equidistant projection. If 
Washington were made the center of such a map the 
rest of the globe would be so presented that all 
distances from Washington would be correctly indi- 
cated by the straight-line distances from Wash- 
ington. Thus all places 1000 miles away would 
lie on a circle of indicated radius 1000 miles, 
having its center at Washington. A dimension at 
right angles to the dis tance- f rom- Washington 
dimension is more and more exaggerated as the 
distance from Washington becomes greater, but the 
fact that all straight lines through Washington 
correspond to great circles on the globe through 
Washington is a merit when the issue is distance 
from Washington. A company engaged in inter- 
national trade might well desire such a map with 
its headquarters, or producing plant, as the 
center. 

A gnomonic projection is one constructed with 
the center of the sphere the vantage point, but 
when viewed by the reader the viewpoint is not 
the center, but from points outside the sphere. 
Only a portion of the globe can be so represented. 
The merit of this pro j ection is that for the 
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portion shown the straight line connecting any 
two points (not merely the center and, a second 
point) corresponds to the great circle on the 
globe connecting these two points. 

There are mappings which can be made of all, 
or portions of, the globe so as to preserve areas. 
One such is the Aitoff equal-area projection map. 
It is elliptical in shape, with the equatorial 
dimension twice that of the inter-polar dimension. 
Another is the sinusoidal interrupted projection, 
which consists of a succession of lens-shaped 
parts whpse touching points lie along the equator. 

Though the globe is the unbiased authority for 
spheric relationships, still these sundry two- 
dimensional maps do well serve the special prob- 
lems involving the direction, distance, or area, 
or a particular point of reference. 

section 6. GRAPHIC PORTRAYAL IN THREE DIHENS IONS 

The two-dimensional base is serviceable in 
other than geographic problems. These essentially 
are situations involving correlated variables, in 
which the frequency of occurrence varies depending 
upon the joint values of two variables. If the 
variables are quantitative we have the common 
scatter diagram underlying simple correlation. 
If the data are qualitative we have the common 
contingency table underlying coefficients of con- 
tingency and 7 2 tests of independence of vari- 
ables. In either of these cases it may be 
serviceable to present the data graphically by 
means of a block diagram. 

A rectangular parallelopiped, for brevity here 
called a block, has three dimensions. As it 
stands upon a flat surface the two dimensions in 
contact with the surface may represent the two 
different variables and the height of the block 
can represent the frequency of the joint oc- 
currence. For accurate portrayal the blocks must 
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be drawn in perspective and the vantage point 
from which the blocks are observed must be such 
that at least a part of each block shall be visi- 
ble. If some block is very low, i . e. , the corre- 
sponding frequency small, so that it is a well 
surrounded by towering neighboring blocks, or at 
least touching towering neighboring blocks in 
front of it, the vantage point must be at a high 
elevation in order to glimpse any portion of this 
block. However, just as hilly country looks flat 
when viewed from a high-flying aeroplane, so 
distinctions in the elevations of blocks are 
difficult if the vantage point is very high. The 
requirements of low vantage point and complete 
visibility are frequently antagonistic . It is 
accordingly true that many data cannot well be 
represented by means of a block diagram. If it 
is possible to envisage every block from a rela- 
tively low vantage point, it may be expected that 
presentation by means of a block diagram will be 
clear and informative. A very common situation 
arises where one only of the two variables is 
qualitative, the other being either temporal or 
quantitative. In this case the temporal , or 
quantitative, order must be preserved in the 
corresponding dimension, but the order in the 
qualitative variable can be chosen because of 
lucidity of presentation and for no other reason. 
The data of Table IV W as represented in Chart 
IV XVIII, illustrate this case. 

The successive steps in the construction of a 
block diagram: We will illustrate these by means 
of the data given. 

The states are listed in alphabetic order in 
Table IV W. It is not important to preserve this , 
and to do so would increase the di ff icul ties of 
presentation. An arrangement of states based 
upon the populations in 1930 is suggested, but 
generally a preferable one, to avoid hidden 
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Pennsylvania, Illinois, Ohio, Texas, Massachu- 
setts, Michigan, California. Since there are 
eight states and only four dates, it will make 
for ease of lettering and reading if the axes are 
arranged as shown in Chart IV XVIII rather than 
in the reverse manner. A good general plan is to 
build up by means of sugar cubes, dominoes, or 
other blocks, an approximately correct structure, 
and then survey it from all angles until the 
vantage point is found yielding the clearest 
picture. 

The actual steps showing the necessary con- 
struction lines involved in the construction of 
the rearmost block are shown in Chart IV XIX. 
In order they are as follows: 

Draw horizontal line AB. 

At some point, 0, on it erect perpendicular CO. 

At some point, D, whose height depends upon 
the height of the vantage point, but is in general 
higher than the highest block, draw horizon line 
FE parallel to AB. 

On this line choose some point, E, for the 
right vanishing point and draw OE. 

Choose some point, F, for the left vanishing 
point and draw OF. Angle EOF depends upon the 
distance of the vantage point above and in front 
of point 0, the greater the height the smaller 
the angle and the greater the distance in front 
the larger the angle. This angle of necessity 
must be greater than 90° and less than 180° . An 
angle of 110° is frequently satisfactory, but a 
better estimate of it can be gotten by observing 
the angle in the domino structure when viewed 
from the desired point. 

On OB lay off conveniently equally spaced 
intervals for the right horizontal scale. 

On OA lay off conveniently equally spaced 
intervals for the left horizontal scale. 

On OC lay off conveniently equally spaced 
intervals for the vertical scale, which must be 
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such that the value for the tallest block falls 
below point D if it is desired to see the top of 
this block. The fact that the top of the tallest 
block cannot be seen in Chart IV XVIII is some- 
what of a disadvantage. 

Draw a grid representing the bases of the 
blocks by drawing lines from the division points 
on OB to F and from the division points on OA to E. 

Draw the most remote block first, erecting 
perpendiculars from the four corners (three if 
the block extends above the horizon) of its base. 

Erect HG, a perpendicular at G, the front 
corner of the rear tier of blocks. 

On OC mark the proper height (12.59) for this 
most remote block. 

Draw a line from this point (12.59) to F, cut- 
ting HG at I. 

From I draw a line, ILME, to E, cutting JK at 
L, which is the upper front corner of the most 
remote block. LM Is the upper right front edge 
and LN, a part of LF, is the upper left front 
edge. If the block extends above the horizon 
these two edges are all that can be seen, but if 
below the horizon the two rear edges are also to 
be drawn as shown by NT', a part of NT, and 
M'P ' , a part of M * F, had this rear block been 
6.00 high instead of 12.59. 

By similar procedure the remaining blocks in 
the rear tiers are constructed, erasing such of 
the lines already drawn as become covered by 
blocks in front. 

Continue, always first constructing the most 
remote of the blocks remaining. 

An alternative method for obtaining point L is 
to draw OJ and extend it to the horizon line FE, 
which it cuts at 0*4 Draw a line from point 12. 59 
on OC to Q. This line will cut JK at L. This 
alternative method is shorter to apply in most 
cases, but it breaks down if Q happens to be at 
the intersection of OC and FE. 
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If any block is completely hidden, it is 
necessary to start from the beginning again, 
using a different order in the rows, a higher 
perspective, or a different observation point. 

Drawing front edges slightly heavier, shading 
one face of the blocks, and recording heights on 
tops of blocks are frequently advantageous. 

Label in perspective left and right horizontal 
dimensions., add title, and erase all construction 
lines. 

The time scale is continuous and the blocks 
from left to right are appropriately in contact 
with each other, but as the states are discrete 
a diagram with small blank spaces between the 
blocks from front to rear would constitute a more 
accurate portrayal, but it would be much more 
difficult to draw than was Chart IV XVIII. 

CHART IV XX 

GROWTH OF POPULATION OF EIGHT LARGEST STATES 


STATE 

1370 

1890 

1910 

1930 

New York 

* - * 

* * 
** * ** 

* * * 
****** 

^ 3 ( 

* * 

* % * * 

. Pennsy Ivan i a 

* * 

* « 

* . * 

* * K 

* 

** * ** 

*r * * 

* * 
& * 

11 1 inois 

* X * 

" * * 

* * 

* * * 

% 

** * ** 

Ohio 

^ * * 

^ * * 

* 

* * * 

** * ** 
* * * * 

Texas 


* * » 

r* * 

* 3 K 

* * 

Massachusetts 

* 

* ‘ » 

* * 

* * 

* ^ * 

* * * 

Michigan 


^ * * 

1 *■ 

^ * 

* _ * 

* * * 

Cal iforni a 

* 

* 


* *• 
* * * 


Legend: Population in millions 
0-2 


6-8 



* * 

* 

* 

w * 


* * 

* * 

* 

*** ** 


8-10 


10-12 


12 -m 


* * * 

* * * 
% & & 


******* 
* * Jt * 


* * * * * 
* * * 
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T he cross-hatched plat: An alternative method 
of presentation of the growth in population data 
of Table IV W is by means of a plat with stip- 
plings, shadings, or cross-hatchings to indicate 
the different frequencies, as shown in Chart IV 
XX. 

The plat is more flexible than the block dia- 
gram method, but in general it requires a coarse 
frequency grouping and its features are not 
vividly depicted. 

PROBLEMS 


Problem 1. Plot temporal data of Table IV X. 
TABLE IV X 

CHINESE AND JAPANESE POPULATION IN THE U. S. : 

I 860 TO l9i+0* 


CENSUS YEAR 

CHINESE 

JAPANESE 

1940 

77,504 

126,947 

1930 

74,954 

138,834 

1920 

61,539 

1 1 1,010 

1 9! 0 

71,531 

72, 157 

1900 

89,863 

24,326 

1 890 

107,488 

2,03 9 

1880 

105,465 

148 

1870 

63,199 

55 

I860 

34,933 



*From the Sixteenth Census of the U.S., 19^-0 


Problem 2. From 1919-20 data of Table IV Y 
compute the rate of mortality per year. Make 
necessary allowance in computations for the 
unequal age intervals reported. Plot male and 
female rate of mortality curves on same coordinate 
paper. 
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TABLE IV Y 


in 


LIFE TABLES FOR WHITE HALES AND FEMALES IN THE 
ORIGINAL REGISTRATION STATES* 


EXACT.. AGE 
IN YEARS 


FEMALES 


1919-19 20 


males 


females 


0 months 

1 month 

2 months 
4 months 
6 months 

. 8 months 

■! 1 months 


2 

4 

7 

12 


1 00, 000 
95,155 
93,914 
92,039 
90,615 

89,453 

-88, 0 73 


87,374 

85,201 

83,449 

82,251 

81,140 


1 00, 000 
96,213 
95,222 
93,632 
92,406 

91,394 

-30.133 


1 00, 000 


89,774 

87,455 

85,812 

84,651 

83,540 


"90, 757 
89, 050 

36,41 I 
85,321 


100,000 


"92,639 — 
91,070 

88, 703 
87, 739 


17 

22 

27 

32 

37 


42 

47 

52 

57 

§ 2 _ 


80, 068 
78,316 
75, 1 89 
73,301 
70,858 


67,422 

63,440 

58,827 

53,158 

45.916 



82,529 
81,042 
79, 020 
73,727 
74,117 


84,121 
82,2 99 
80, i 29 
77, 650 
74, 899 


71,249 
57,938 
63, 942 
58, 790 
51,999 


71 , 879 
68,441 
64,311 
58, 885 
51.896 


43, 453 
33,185 
22,210 
12,001 
4,762 



86, 673 
84,849 
82,364 
79,615 
76,917 


74,194 
71,070 
67, 1 88 
62, 132 
55,540 


46, 902 
36,131 
24, 425 
13,521 
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Problem 3. Examine Table IV Z. What seems to 
be the most striking phenomena revealed? Make 
graph to show this accurately. 


TABLE IV Z 

SCHOOL ATTENDANCE IN GEORGIA AND WASHINGTON BY AGE: 1940* 


Age 

Georgia 

Wash i ngton 

TOTAL 

NUMBER 

ATTENDING 

SCHOOL 

TOTAL 

NUMBER 

ATTENDING 

SCHOOL 

7 to 13 years 

44-9,562 

413,299 

171,555 

166,945 

14 and 15 years 

128,916 

101,211 

54, IBS 

51,675 

16 to 20 years 

330,280 

98,567 

149,134 

79, 056 

21 to 24 years 

239, 4-48 

6,938 

1 1 8, 905 

9,328 


* Sixteenth Census of the u.s., 1940: population, second 

Series, Characteristics of the Popu-iation, United States 
Summary, p. 76. 


Problem 4. Choose the most appropriate inter- 
val possible and plot data of Table IV AA as a 
frequency polygon. Suggest an improvement in 
the original data which would have made still 
more comparable the performances of short and 
tall men. 

Problem 5. Plot the data of Table IV AB as a 
percentile graph. We will first make certain 
nearly arbitrary assumptions: (a) That "under 
1000" covers the range "400 to 1000"; (b) that 
"villages not incorporated" cover the range "30 
to 400"; and (c) that "rural farms" cover the 
range "1 to 30. " A community of size less than 1 
is impossible so a zero percentile of .5 (the 
boundary between 0 and 1) may be plotted. The 
100 percentile should not be plotted, since the 
maximum possible size of community is not deter- 
minable from this sample. Because of the great 
range, 1 to 6,930, 466, use a logarithmic scale of 
sizes of communities (ordinate). The abscissa may 
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TABLE IV AA 


STANDING HIGH KICK 

Distances Measured to the Nearest Inch 
Above or Below Standing Height (s. h. ) 


Score i n Inche s 
Above s.h. 


F re quenc y 


19 
18 
17 
16 
15 
14 
13 
12 
1 1 
10 
9 
8 
7 
6 
5 
4 
3 
2 
I 

0 

Below s. h. 

1 

2 

3 

4 

5 

6 

7 

8 
9 


1 

1 

2 

3 

2 

1 

7 

3 

9 

9 

7 

12 

19 

22 

15 
25 

16 
10 
12 

7 

7 

1 

2 

1 

2 
2 
0 
1 


Frederick Warren Co 
ability In college 
E ducation Series, v 


N = 203 


THE measurement of 

Un 1 v . Of Oregon Pub I I 
’ No * 3 , April, 1929 


general athletic 

cation, Physical 
t page 188. 
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TABLE IV AB 

URBAN-RURAL DISTRIBUTION OF POPULATION OF U. S. IN 1930* 


SIZE OF COMMUNITIES 

NUMBER 

POPULATION 

Cities 

6, 930, 446 


1 

6,930,446 

1,000,000 - 

4,000,000 

4 

8, 134, 109 

400, 000 - 

1,000,000 

13 

8, 067,471 

200,000 - 

400,000 

23 

6,508,600 

100,000 - 

200,000 

52 

6,685, 1 10 

25,000 - 

100,000 

283 

12,917, 1 4 1 

10,000 - 

25,000 

606 

9,097,200 

5,000 - 

10, 000 

85 i 

5,897, 156 

2,500 - 

5,000 

1,332 

4,717,590 

V i 11 ages 

1,000 - 

2,500 

3,087 

4, 820,707 

Under 

1,000 

10,346 

4,362, 746 

V il 1 ages 

not incorporated 

14, 479, 257 

Rural Farms 

6, 

288,648 

30, 157,513 


* Abstract of the Fifteenth Census of the U.S., 1930 pp. 16, . 17, 
18, 19, 21. 


be "Percentage of people residing in communities 
of smaller size than size indicated. n Since 13 
classes only are available if percentiles corre- 
sponding to the boundaries of these classes are 
computed and plotted, the result will be appreci- 
ably more accurate than if integral percentiles 
such as P Q1 , P g05 , P.io, etc., are computed and 
plotted. 

Problem 6. Plot data of Table IV AC ac- 
cording to the five ways for drawing a graph 
mentioned in Section 3 (IV, p. 140) and compare 
results. Employ method (c) first in order to be 
uninfluenced by (a ) , ( b ) , (d ) , or ( e ) . 
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TABLE IV AC 


HEAD LENGTHS OF 1306 NON-HABITUAL CRIMINALS* 


HEAD, 

FREQUENCY 

HEAD 

FREQUENCY 

HEAD 

F R E QU ENCY 

LENGTH 


LENGTH 


LENGTH 


210 

1 

195 

66 

182 

28 

209 

1 

195 

56 

181 

17 

208 

0 

194 

83 

180 

13 

207 

6 

i 93 

79 

179 

12 

206 

3 

192 

102 

173 

7 

205 

8 

191 

96 

177 

5 

204 

13 

190 

85 

176 

3 

203 

14 

189 

83 

175 

3 

202 

24 

188 

58 

174 

0 

201 

20 

187 

55 

173 

2 

200 

30 

186 

57 

172 

1 

199 

35 

185 

53 

171 

1 

198 

43 

184 

43 

170 

0 

197 

56 

S 83 

24 








1306 

* Adapted from data 

given on 

page 1 xx v , 

TABLES FOR 

; STATISTICIANS 

AND 

BIOMETRIC! ANS 

, ed 1 ted 

by Karl Peai 

rson , Vo 1 . 

1, 1914, 


In employing method (d) we will smooth by re- 
placing each original class frequency by the 
average of the 5 nearest class frequencies, e.g. , 
6, the frequency for class 207 is replaced by 

1 -f-0 +6 +3 +8 

3*6 (= “) etc., for all classes from 

5 

168 to 213 inclusive. 

Method ( e) is very simple for these data be- 
cause they are so nearly normal, but its exe- 
cution may be postponed until later chapters, 
dealing with cent ral tendencies, variability, and 
the normal distribution, have been studied. 
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Problem 7. Using data of Table IV AC* compute 

0 5 * 10 * 2 5 ’ 5 0 ' 7 5 ' ^9 0’ ^. 95 * an< ^ 

also the lowest and highest percentiles that are 

permissible. 

Compute the standard errors of each of these 
percentiles. 

Problem 8. Using data of Table IV W, select 
scales, left, right, and vertical, which are 
quite different from those of Chart IV XVIII, 
construct a block diagram. "Proo f read ” for 
accuracy by careful visual compari son of relative 
hei ghts with those of Chart IV XVIII and by 
sighting down lines to see if they do converge 
as required. 

Problem 9. The accompanying problem involves 
plotting contours and interpolating by means one 
degree more refined than is linear interpolation, 
and so as to preserve the frequencies in intervals 
rather than to preserve the frequencies at spe- 
cific points, —class indexes. It is entirely 
feasible to present the data of Table IV AD in a 
block diagram, but they can also be well shown by 
frequency contour lines on a mat, the horizontal 
and vertical dimensions of which are age and 
grade. Lines for frequencies 100, 1000, 5000, 
10,000, 15,000, 20,000, 25,000, 30,000, and 
35,000 will suffice. 

One may interpolate in the figures given by 
rows, and also again in the figures given by 
columns, to obtain grade and age points for each 
frequency, 100, 1000, etc. Connecting all the 
points for 100 gives a contour line for this 
frequency, and similarly for the other contour 
lines. Linear interpolation for points on the 
contour line for the frequency 100, though not 
altogether satisfactory, may suffice, but it 
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TABLE IV AD 

AGE-GRADE DISTRIBUTION OF CALIFORNIA SCHOOL CHILDREN 
(Compiled by Research Division of the University of California) 


AGE AT 
LAST 
BIRTH- 
DAY 

GRADE 


1.5 

2.5 

3.5 

4.5 

5.5 

6.5 

7.5 

8.5 

21.5 

13 

6 

7 

9 

5 

2 

39 

6 

87 

20.5 

1 

4 

3 

5 

3 

9 

17 

18 

60 

19.5 

8 

5 

9 

7 

15 

14 

35 

71 

16 4 

18. 5 

14 

16 

23 

39 

38 

57 

139 

319 

645 

17.5 

30 

33 

62 

102 

150 

255 

575 

1340 

2547 

16.5 

67 

108 

146 

305 

550 

928 

2176 

49 0 3 

9183 

15.5 

123 

168 

317 

686 

1279 

2621 

58 23 

10791 

21808 

11.5 

203 

283 

600 

1417 

2715 

6151 

13517 

16531 

41417 

13-5 

255 

538 

1113 

2603 

5959 

11451 

17569 

11644 

509 27 

12.5 

700 

1011 

2373 

5736 

H685 

I8353 

12402 

2704 

54964 

11.5 

889 

2055 

5271 

11103 

19509 

12844 

2708 

251 

54630 

10.5 

1912 

4468 

12217 

21458 

1437 7 

3'091 

210 

8 

57741 

9.5 

4169 

11288 

24 241 

158 75 

2602 

166 

9 


58350 

8.5 

11373 

26688 

18 213 

2562 

136 

4 



58976 

7-5 

3^609 

19 598 

2084 

10 6 

2 




56399 

6.5 

312*06 

1167 

34 






3 240 7 

UNDER 










6.0 

1203 

15 







1218 

TOTAL 

86775 

_ i 

67451 

66713 

62013 

59 0 25 

55946 

55014 

48586 

501521 


hardly will for the other contour lines for the 
other contour lines for the age ancf grade group- 
ings are coarse and the frequencies large. Since 
"under six "includes four- as well as five-year 
olds, neither the "under six" frequencies, nor 
any involving them in interpolation will be 
entirely satisfactory. The error in treating all 
"under six” cases as five-year-olds should not be 
great. Since no frequencies are given for kinder- 
garten classes, no interpolated frequency values 
involving grade . 5 (middle of the kindergarten 
year) will be satisfactory. Because of the lack 
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of kindergarten data neither linear nor parabolic 
interpolation as here described should involve 
grade . 5. Consider the four sequential frequencies 
0, 1203, 31206, and 34609. The five contour lines 
for frequencies of 5 , 000, 10,000, 15,000, 20,000, 
25,000 will pass between the tabled values 1203 
and 31206. Linearly interpolated values would be 
undesirably and unnecessarily crude. Para- 
bolically interpolated values would yield values 
lying on a parabola fitted to three points, 0, 
1203, and 31206, or 1203, 31206, and 34609. To 
avoid the issue as to which set of three to employ 
and to get at the same time somewhat more trust- 
worthy results, we may fit a parabola to the 
points 1203 and 31206 and as near as possible, in 
the least-squares sense, to the neighboring points 
0 and 34609. The equation of such a parabola is 

t =T3<-'i +9 V 9f 3- f »> + «3- 'a>f 

V f 3 + .....[4:04] 

in which f lf f 2 , and f H are the four tabled 

frequencies in order and ^ is a deviation in 
years from 6.0, i . e. , a point halfway between 
the class index values for the second and third 
tabled values. From this equation £ may be de- 
termined for the needed contour levels, f c 5000, 
f = 10,000, f = 15,000, f = 20,000, / = 25,000, 
and f - 30,000. For example, for the 5000 level 
we have 

5000 =Jg(0 + 10827 + 280854 - 34609) 

+ (31206 - 1203)^ 

+ 4~(0 - 1203 - 31206 + 34609)<f 2 
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yielding <f = “,371 or “5 4.2, this latter value 
being extraneous as it is obvious that the root 
desired will lie between “*5 and .5. The value 
£ = 371 is equivalent to age 6.629, the age for 

grade one at which the frequency is 5000. 

Though this method should give fair results, 
the following may logically be expected to give 
somewhat better values. We first note that the 
area under any stretch of curve [4:04] corresponds 
to the number of cases in this interval. Let us 
find by integral calculus this area from £ - “1.00 
to ^ - .00. The answer is 1249, which differs from 
the correct value by 46. Also, in the next 
interval the number of cases under the parabol 
is 31252, which is in error by 46. We may avo 
these discrepancies by fitting a parabola having 
the observed areas for each of the two middle 
intervals and coming as close as possible, in the 
least-squares sense to having the requisite 
areas in the first and fourth intervals. The 
equation of this parabola differs only in the 
constant terms from [4:04]. It is 

f =]^(-/ 1 + 7/ 2 +7/ 3 - f 4 ) + (/ 3 - f 2 )f 

+ *4 V V [4:05] 

which is fully as simple to use as [4:04]. For 
the frequency level 5000 we have 

5000 = (0 + 8421 +218442 - 34609) + (31206 - 1203)£ 

+ 4 (0 - 1203 - 31206 + 34609 )£ 2 

yielding £ = -.370 and -.54.2, this latter value 
being extraneous. The value £ = -.370 is equi- 
valent to age 6.630, which happens to be negli- 
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gibly different from 6.629 derived by the pre- 
ceding method. 

The determinations using [4:05] of all other 
contour points and the plotting of contour lines 
is left as an exercise. For checking purposes a 
few values for the contour level 5000 are given 
herewith: (age 7.5, grade 3.287), (age 8.5, 

grade 4.322), (grade 1.5, age 6.30 and also age 
9.161), (age 6.5, grade 2.240). In the calcula- 
tion of this last value we have the data 

Grade Grade Grade Grade 
1.5 2.5 3.5 4.5 


Age 6.5 31206 1167 34 0 

The contour point for level 5000 lies between 
grade 1.5 and grade 2.5, but as we have no re- 
corded value for grade . 5 we cannot use the method 
just illustrated. However, we can derive the 
equation of a parabola, reproducing exactly the 
areas in each of the first three intervals. If 
the origin is 2.00, i . e . , a point midway between 
the first and second class indexes, and the fre- 
quencies in these three classes called / 2 , / 3 , 
and f 4 , the equation is 

f -jp (2/ 2 + 5/ 3 - / 4 ) + (f 3 - f 2 )£ 

+J(f 2 "2 f 3 + f,)£ 2 [4:06] 

For the data in hand this is 
/ = 11368g" - 30Q39£ + 14453£ 2 

which, when / - 5000, gives g - .240, corresponding 
to grade 2.240. 

It is interesting to note in passing that, in 
a situation wherein / x , f 2 , f ^ , and exist, if 
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f as given by [4:06] and f as given by a similar 
equation involving f ± , f 2 and f ^ are averaged, 
the resulting expression for f is [4:05]* 

Problem 10: Vocabulary Review 


bar diagram 

basal year 

block diagram 

categorical 

center of population 

circle chart 

class frequency 

class index 

class interval 

class limits 

contingency table 

contour lines 

cumulative frequencies 

cycle 

frequency polygon 
geographic 
golden mean 
grid 

growth curve 

histogram 

horizon 

inflexion, point of 

least squares 

linear interpolation 

logarithmic chart 

median 

minor mode 

mode 

normal population 
ogive 

parabolic interpolation 
percentile 
percentile curve 
perspective 


pictogram 
pie chart 
plot 

price index 

pseudo growth curve 

quadratic mean 

qualitative 

quantitative 

relative time chart 

sample 

scatter diagram 

segmented bar 

semi -logarithmic chart 

skewed curve 

spatial 

standard deviation 
standard error 
temporal 
time chart 
trend 

vanishing point 
zero point, natural 
o- (sigma) 

2 (capital sigma) 
i (xi) 

X 2 (chi square) 

p P 

p 

q 

v P 

ip 

ip 

Fp 


CHAPTER V 


TEE STABLE FEATURES OF PHENOMENA 

section l. THE QUEST FOR CERTAINTY 

The yogi seeking peace in Brahma, the philo- 
sophical idealist finding reality in his own 
world of concepts, and the mathematician delving 
ever deeper into the abstractions of pure mathe- 
matics are united in abjuring the turmoil of 
human passions and the strife and struggle of 
competitive life and each is seeking in his sepa- 
rate way to find an anchor, a Rock-of-Ages , an 
everlasting law of being,, The stabilities of 
life th&t each finds, or thinks he finds, are 
self-satisfying and eternal in the mind of the 
conceiver. Without questioning the existence and 
reality of stabilities of this sort we can note 
that their source is strictly and individually 
psychological, depending from the logical, in- 
stinctive, and habitual mental processes of the 
individual- We may also note that the applica- 
bility of universal and eternal principles and 
beliefs to concrete issues is never unquestioned, 
but in each instance subject to a judgment of 
pertinence. The pertinence has to be established 
by observation or experience and not by pure 
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deduction. The concepts of stability referred 
to, but not their fields of practical import, 
have a nonstatistical basis. Let us freely grant 
that this field of life does exist, and that it 
has an importance as judged by different indi- 
viduals from supreme to trivial , and that it 
lies beyond the pale of experimental statistics. 

The need and the search for anchors in life 
is not limited to philosophical idealists. They 
are, perchance, in less need for them than other 
folk, including avowed pragmatists and empiri- 
cists. The stabilities that these latter seek 
differ in three important respects from those 
sought by the former, — they are relative, not 
absolute, they are observationally determined, 
and they serve a finite purpose in an observably 
limited realm. This is the field served by ap- 
plied statistics. 

The concept of relatively ultimate durability 
or stability is valuable. Suppose it is reported 
that "Herman Rosenwald is a ten-year-old Jewish 
boy having an intelligence quotient of 130. " The 
age item may be reacted to in a fixed manner for 
such a period of time as may be quite sufficient 
for many immediate issues. That his intelligence 
quotient is 130 is more stable temporally than 
was his 10-year-oldness . But * this concept, 
consequent to some fallible measurement device 
applied at some moment in time, must be thought 
of as only relatively stable and only so with 
reference to certain none too clearly defined 
fields of functioning. That he is Jewi sh has 
greater temporal stability, but is of variable 
utility if used as an interpretative aid in de- 
limiting mental and emotional conditions and 
activity. Though each of these items has relative, 
not ultimate, stability, nevertheless each is 
undoubtedly serviceable within limits , which are 
discoverable with greater or less precision. 

Statistical study is primarily a search for 
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features of relative stability in connections 
not known from earlier study to possess them. 
This search should not be indiscriminate. 

section 2 . BIOLOGICAL AND SOCIAL STABILITY 

That there is form to life has been proclaimed 
\ by philosophers both early and late, and that it 

is rooted in biological, psychological, and 
social laws of survival and accommodation has 
been clearly pointed out by Charles Darwin, 
Lester Frand Ward, and others. The revelation of 
such form is the prime purpose of statistical 
study. Phenomena are conditioned in certain 
inherent ways which, though not immediately 
obvious, are .compelling in their power to limit 
the range of values, and the relative frequencies 
of the different possible values, which can be 
taken. 

If a cow is tied to a stake by a stout elastic 
tether, she is constricted in her grazing just 
as really as if tied with a Manila rope. Gener- 
ally speaking, the restrictions upon social and 
biological phenomena are elastic, not rigid, and 
the strengths, or lengths, or existence, of these 
elastic bonds must be inferred by a study of 
tendencies, not of rigid limits. Can we infer 
the restricting effect of the elastic tether by 
observations as to the thoroughness with which 
the turf has been grazed? We can, but, primarily, 
not by noting the most extreme tufts plucked. 
Such a tuft might have had so succulent an appeal 
that the cow charged for and attained it at a 
distance quite beyond her usual grazing radius. 
By a careful study of the stubble we can obtain 
evidence not only as to the center of the grazing 
area, but also as to the elasticity or vari- 
ability of the restrictive influence. If cow, 
tether, and stake are removed, and stake-hole 
obliterated, we can approximately reconstruct 
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the situation from the evidence yielded by the 
stubble, making close estimates of the location 
of the stake and of the variably restrictive 
influence of the tether. 

Many phenomenological situations are charac- 
terized by a central tendency with elastic bounds. 
An oyster growing a thicker and thicker shell is 
more and more adequately protected from enemies 
but also more and more encumbered in movement. 
There is thus an optimum shell thickness — that 
is, a central tendency -coupled with decreasingly 
permissible variability from it — that is, a 
typical pattern of variability. The interest 
rate on a mortgage of certain amount, with 100 
shares of stock A as collateral, would not be a 
fixed rate if there were many independent place- 
ments of such a mortgage. The lower the rate the 
fewer lenders, and the higher the rate the fewer 
borrowers, with the result that there would be 
a central tendency and a typical pattern of 
variability from it. Children of a given age are 
distributed throughout a number of school grades. 
They have a central tendency of scholastic ability 
and there exists a central tendency in school 
grade attended. As their abilities are farther 
and farther below the average there is greater 
and greater pressure by school authorities to 
lower the school grade attended, and as they 
are more and more able there is greater and 
greater individual effort to secure extra pro- 
motions so as to profit by higher grade offer- 
ings. Again a central tendency and a typical 
pattern of variability. Innumerable illustrations 
can be given. 

There is no logical reason for believing that 
successive increases of .1 millimeter in thick- 
ness of oyster-shell above the central-tendency 
value will have a protective influence just 
equivalent in survival terms to the mobility 
benefit of successive decreases of .1 millimeter 
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below the central- tendency value. The detri- 
mental effect of increased shell weight is con- 
ditioned by certain environmental conditions 
(perhaps getting food) which determine the vari- 
ability pattern upon the above-average-shell- 
thickness side of the distribution, while the 
detrimental effect of decreased shell weight is 
consequent to other environmental conditions 
(perhaps escaping enemies), and this may well 
lead to a different variability pattern upon the 
below- average side of the distribution. The. net 
result is a non symmetrical or skewed distri- 
bution. The disinclination of lenders to lend 
at low interest rates is a social phenomenon of 
a different sort from that of the disinclination 
of borrowers to borrow at high interest rates, 
so again we should expect the patterns of vari- 
ability below and above the average interest 
rate to differ. Certainly the school pressure 
to demote the dull is of a different order from 
that of the individual urge to profit by extra 
promotions if bright. 

We conclude that the skewed distribution is 
to be expected in physical, biological, psycho- 
logical, and social life. We may also conclude 
that a knowledge of central tendency, of vari- 
ability, and of skewness, — these last two to- 
gether being a convenient summary of knowledge 
separately of variability below the average and 
above it, — is knowledge of three characteristic 
and quite independent aspects of phenomena which 
yield distributions. An equivalent statement, is 
that these aspects are stable features of phe- 
nomena of a certain type, recurring with slight 
modification from sample to sample. 

Compare the enlightenment which the statement , 
"12%-year-old Bessie is in the low seventh grade," 
carries to the untutored and to the statistically 
informed auditor. The reaction of the first may 
be "What of it? " and of the second, that "Bessie 
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is bright and ambitious; she and her parents 
have progressively through a number of years 
fought the lock-step regimen of the school; 
and either entered school at an early age and 
was bright enough to profit by it or has been 
doubly promoted twice, for she is two half-grades 
above the average, a status attained by less 
than twenty per cent of her age-group; and her 
scholastic achievement is about equal to that of 
average low eighth-graders. The last of these 
items, and in part the first, depends upon know- 
ledge of a statistic other than the three men- 
tioned, — namely, that children who get extra 
promotions tend to get them to a grade location 
half as far above the average grade for their 
age as their ability warrants. The statistically 
informed has had knowledge of four stable features 
of age-grade- achievement phenomena that the 
untutored lacked. Furthermore, there are no 
substitutes for this knowledge, for, lacking the 
information inherent in the four statistical 
items mentioned, one would find it impossible to 
reach the same conclusion with equal certainty. 
It of course must, in connection with a deduction 
from these four items, as in the case of a de- 
duction from any other information, be recognized 
that there is an element of uncertainty in a 
conclusion drawn, e.g., that from the facts given 
Bessie has the average scholastic ability of low 
eighth-graders has a probable error of approxi- 
mately one-half a year. 

Statistics should enable one to see the indi- 
vidual event in a setting and thus to know it as 
only he can who knows the background. The notion 
that statistical study deadens sensibility to 
individuality is utterly false. A person is more 
truly individualized when a given set of items 
about him are seen in the perspective of the 
population to which he belongs than when viewed 
as a unique event. To claim complete individu- 
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alization when viewed as a unique event asserts 
the ignorance of the observer rather than the 
individuality of the subject. 

The balance that biologists repeatedly find 
to exist in the forms of life under a certain 
environment represents a type of stability dis- 
covered, described, and attested to with a known 
certainty by statistical study. The concept here 
is not stability within, or of, a single distri- 
bution, but stability of frequencies of oc- 
currence of items of different sorts. The simp- 
lest case exists when two forms of life interact 
existentially. Ecology and parasitology are 
sciences concerned with stable relationships 
between living things. If a weevil, host to 
some parasite, exists in 1 arge numbers, the living 
conditions of the parasites are more favorable 
and they multiply to the point of excessive 
killing of their hosts, with the result that 
they must die in large numbers from duress of 
living conditions, with the resul t that the hosts 
now multiply, and so it continues until some 
natural balance is reached. Clearly,, in this 
situation, the relative frequencies of host and 
parasite are crucial items of information; es- 
pecially informative are these frequencies under 
stable life conditions. 

The first concern of marine biology is perhaps 
the balance in the frequencies of the various 
forms of life in the homogeneous environment 
that the ocean provides. This balance can be 
highly trusted unless some new factor, such as 
man’s predatory activities, changes things. In 
a certain social structure there can exist to 
mutual benefit so many milkmen to so many grocers, 
to so many barbers, to so many shoemakers, etc., 
and if some number should happen to be excessive 
it will tend to right itself. This same type of 
phenomena is precisely present in sundry chemical 
reactions. 
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Throughout the entire field of physical, bio- 
logical, and social life we find that the fre- 
quencies, or proportions, in classes tend, under 
stable conditions, to fixed values. The trust- 
worthy discovery of these proportions in classes 
is a fertile field of statistics. It is a phe- 
nomenological stability other than that discussed 
under form of distributions. However fixed these 
proportions are under stable conditions, they 
show typical growth or cyclical features under 
changing conditions. The proportion of people 
engaged in airplane manufacture has hardly at- 
tained such a value that we can look upon it as 
approximately fixed from year to year. For 
comprehension' we require fixed reference points. 
If the proportion engaged in airplane manufacture 
does not provide one we look further. We might 
investigate if the rate of change of this pro- 
portion is constant, or geometric, or definable 
in some demonstrably precise terms. We must find 
some anchorage or expect that none but hazardous 
appraisals can be made. The logical need for 
awareness of the stabilities in phenomena is the 
need that statistics serve. 

In studying proportions in classes and seek- 
ing evidence of stability therein it would seem 
necessary that the individuals in the classes be 
in some sense , or to some degree , interdependent . 
A study of the relative frequencies of cotton 
boll weevils and their parasites would be profit- 
able, but a joint study of cotton boll weevils 
and citrus tree scales would not. Just as there 
are conflicting and interacting influences at 
work in determining the form of distribution, so 
there are in the matter of frequencies in classes. 
We are interested in those situations where 
these frequencies do in some sense directly 
interact or are affected by common causes. 

In choosing cases for statistical study it is 
safe to say that the appropriate members should 
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always he competitive or interacting individuals , 
A certain study of the intelligence of hoboes 
and college graduates can be cited. The re- 
lationships reported, of course, had no guidance 
or classificatory importance, and time has shown 
them to have been very misleading to readers* 
unaware of the triviality of relationships be- 
tween noncompetitive groups. 

We will attempt to classify the sorts of sta- 
bility possible in phenomena, admitting the 
likelihood of still other sorts not yet dis- 
covered or conceived. 

section 3. TEMPORAL DATA 

Observations over a period of time may show 
(1) continuity of value from one moment to the 
next, (2) a general trend, (3) periodic fluctu- 
ations, (4) nearly constant proportionate re- 
lationships to some other variable, or variables, 
(5) evidences of limits below which, or beyond 
which, values cannot occur, (6) simplicities in 
any of the preceding five when some transfor- 
mation of the time variable is made. Presumably, 
knowledge of the existence of a temporal series 
is direct and initial. That is all that is ne- 
cessary to suggest the examination of the data 
along one or more of the six lines mentioned, 
with the hope of discovering its stable features. 
As soon as a type of stability is discovered, a 
foundation is laid for a study of causes and 
meanings . 

section M-. GEOGRAPHICAL DATA 

Observations which are intrinsically connected 
with position in two-dimensional space may show 
(1) continuity of value from point to neighboring 
point, (2) discontinuity from point to neighbor- 
ing point, (3) geographic patterns of frequency, 
(4) geographic patterns of categories, (5) direct 
or (6) inverse, frequency relationship from point 
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to point, (7) patterns in combination types of 
frequencies, (8) simplicities in any of the pre- 
ceding seven under transformations of space. 
Knowledge of the existence of geographical data 
is direct and initial and immediately suggests 
search for stabilities of the eight sorts listed 
and for their causes and meanings. 

Temporal and geographical : Many situations 
proclaim themselves as both temporally and geo- 
graphically conditioned, as for example do farm 
crops, and then stabilities involving all combi- 
nations of the sorts in the two types listed are 
within the realm of possibility. The problem of 
discovery of stable features is greatly comple- 
cated, but the general modes of search are indi- 
cated by the 6X8 combinations. 

section 5 . QUANTITATIVE AND QUALITATIVE DATA 

Much data may be judged to be relatively 
independent of the time and place of collection 
in the sense that the issues involved are not 
dependent upon them, as, for example, would be 
a candidate’ s record upon a Civil Service exami- 
nation. With data of this sort we look for stable 
features in the form of single distributions and 
in the relationships of two or more paired distri- 
butions. 

Qualitative series: The fundamental types of 
stability are (1) proportions in classes as 
successive samplings are taken, (2) relationships 
between these proportions, (3) existence for 
these proportions of upper and lower limits other 
than one and zero, (4) relationships of super- 
and subordination between classes, — such re- 
lationships may be invariable or tendential. 

A combination of stabilities of types (2) and 
(4) is revealed in many genetic chromosomal 
studies and similar conditions are to be expected 
in psychological vocational studies. 
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The statistics of sampling* association* and 
contingency are well developed and provide keen- 
edged tools for analysis of qualitative data. 

Quantitative series: For quantitative series 
the fundamental types of stability reside in the 
parameters, or constants, which are definitive 
of a distribution or of distributions in their 
relationships. These are measures of (1) central 
tendency, (2) variability, (3) skewness, (4) 
kurtosis, (5) multimodality, (6) limits, (7) ex- 
cluded values, (8) stability of each as affected 
by principle of sampling employed, (9) stabilities 
revealed as transformations of the scale of 
measurement are made, (10) relationships in all 
these respects of two or more paired series. 

Generally items (1), (2), (3), (4), and (5) 
are of very unequal excellence in the information 
they give about a parent population. Many distri- 
butions are such that a sample of N cases will 
yield trustworthy information about, say, (1) 
and (2) and not about (3), (4), or (5). The 

higher moments fi^, jjl^ 9 ^ p etc., have been pro- 
posed as descriptive statistics in connection 
with extremely skewed and leptokurtic series 
when not as justified upon the ground of sta- 
bility as other neglected statistics. No matter 
how neat a statistic may be algebraically, its 
real merit is dependent upon its stability, the 
precision with which it depicts a feature of the 
population. 

Amateurs err in coining and using an odd sta- 
tistic because of some simplicity, real or fan- 
cied, in meaning, irrespective of and in igno- 
rance of its instability. In illustration may 
be mentioned "the percentage of those getting A 
in mathematics who also got A in law” as a meas- 
ure of correlation. 

It must be admitted that professionals have at 
infrequent times erred in using a statistic 
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because of amenability to algebraic treatment 
irrespective of, though scarcely in ignorange of, 
its instability. In illustration maybe mentioned 
the use of higher moments , /x 4 , and even p Q , 
though other measures such as percentiles and 
proportions in classes could reveal otherwise 
undisclosed though trustworthy features of the 
population. 

Of two alternative statistics , each of suf- 
ficient reliability for samples of the size in 
question , we may well choose the one of greatest 
algebraic simplicity but never employ an un- 
reliable statistic no matter its algebraic sim- 
plicity. 

Quantitative and qualitative: Many situations 
combine the separate issues of the quantitative 
and qualitative series. One combination, well- 
nigh universal, is found when quanti tative differ- 
ences inhere in the very trait that has led to 
categorization, if one class of fruit fly is 
labeled "stubby wing" it will nevertheless be 
found that there are different degrees of stub- 
biness in those placed in this class; even in 
that class of people labeled "male" it is found 
that a variability in maleness exists. The same 
issue arises in quantum- theory physics. The 
standard statistical approach to this problem is 
first to abstract all the information possible 
by treating the data as qualitative, and then to 
abstract still more information by superimposing 
a quantitative treatment on this . Where such 
added quantitative treatment is not immediately 
feasible, the student should hold the strong 
conviction that he has but partially, — we may 
even say but superficially, — plumbed the issues. 
In the field of psychology the quantification of 
phenomena earlier thought of as qualitative has 
been one of the most fruitful means of advance. 
We may expect it to continue to be so both in 
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psychology and in other fields of physical and 
social science. 

Another common combination of the two is found 
when quantitative and qualitative series enter 
into the single problem, as for example is the 
case where a person’s intelligence (mainly quanti- 
tative as commonly measured) and his sex (quali- 
tative as commonly measured) affect his fitness 
for some task. Though such bastard devices as 
tetrachoric and biserial correlation have been 
developed to cope with this situation, the student 
cannot help but feel that these measures hold 
but a part of the information necessary for a 
true* solution of the problem. 

The qualification of seemingly quantitative 
phenomena is also a frequently illuminating 
process. The need for it would seem to exist 
where the so-called quantitative measure is 
largely subjective or hastily conceived, as for 
example is the case with measures of "general 
intelligence. " 

section 6. COMPLEX DATA 


Only the simpler types of complexity have 
been mentioned: temporal-geographicaland quanti- 
tative-qualitative . However, very real and 
important problems involving all combinations of 
the various series exist. It is, in fact, quite 
arguable that every real problem does involve 
them all, and never one, two, or three only. 
There are two standard approaches to the handling 
of such problems. The generally preferred and 
more convincing one is so to set up experimental 
situations that variability in phenomena due to 
all but one of the types is negligibly small, so 
that the statistics of a single series may be 
employed.. The other method employs the analy- 

* Ascertainable "truth" being relative, the word is used to 
indicate higher rather than a lower stage, and not an absol- 
ute attainment. 
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sis of variance techniques and allocates to each 
type the variability attributable to it. This is 
a beautiful and powerful technique, but its merit 
is sufficiently restricted by its necessary 
complexity, when the data are complex, to warrant 
a serious search for experimental set-ups in- 
volving fewer issues. 

section 7 . EMPLOYABLE INSTRUMENTS IN THE QUEST 

FOR CERTAINTY 

Correlative with each type of stability is a 
statistic (or are statistics, in the case al- 
ternative methods are available) which meets the 
issue. These become apparent with statistical 
expertness . We give a few examples herewi th: 
Temporal phenomena as listed in Section 3 numbers 

(1 ) and (2 ) are served by time charts , (3 ) by 
periodogram analysis and periodic trigonometric 
functions, (4) by ratios and index numbers, and 
(6) by growth curves and other transformations. 
Geographic phenomenon (1) is served by contour 
maps , ( 2) by spot maps , (3 ) and (4) by super- 
imposed maps. Qualitative phenomenon (1) is 
served by proportions in classes and their stan- 
dard errors (standard errors are important in all 
connections, but perhaps most strikingly so here), 

(2) by correlations between class frequencies in 
excess of that due to chance, (3) by association 
methods. Quantitative phenomenon (8) is served 
by Lexis' ratios and Fisher's variance- ratio 
tests, (9 ) by normalizing data and other non- 
linear transformations , (10) by* sundry measures 
of simple and multiple correlation, (9 ) and (10) 
by multivariate analysis , including mental factor 
analysis. 

This list is in no sense exhaustive, but just 
a start which the student will find it profitable 
to amplify as his knowledge of techniques ex- 
pands. 


CHAPTER VI 

MEASURES OF VARIABILITY 

SECTION l. THE GENERAL CONCEPT OF VARIABILITY 

It is axiomatic that if all measures of a 
series are and must be alike there is nothing to 
investigate or report. The primary phenomenon 
of a series is that there are differences between 
the measures. We will later see that a deduction 
from this is that the measures have a mean (see 
[6:03]). It seems to the writer that the concept 
of variability is primary and that of central 
tendency secondary, but, as will be shown, one 
cannot think of the one in any but the most 
casual manner without being compelled to think 
of the other. They are as wedded as are the 
facts that a right triangle has three sides such 
that the sum of the squares upon two is equal to 
that upon the third, and it has three angles 
such that the sum of two is equal to the third. 
We now turn our attention to certain necessary 
relationships between the measures in a series. 

To assist a student who has difficulty in 
dealing with algebraic symbols, a numerical illus- 
tration of quantities represented in Tables VI A, 
VI B, and VI C immediately accompanies each of 
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these tables. This numerical illustration need 
not be noted unless desired. The illustrative 
series is composed of the following four measures, 
8, 5, 2, and 1, which have a mean of 4, a vari- 
ance of 7. 5, and a variance of differences of 20. 

We will, then, start with the most elementary 
concept of variability possible, namely, that 
two measures of a series X a and X b differ. We 
will call the fact that (X a - X b ) = d a b , a quan- 
tity not zero, the most elemental or primitive 
measure of variability. If we have a series 
X a , X b , X c , . • X n , there are many such measures 
as given in Table VI A herewith, in which the 
null measures of the type (X a - X^) must not be 
counted, but are included because they lead to 
later simplifications. We can let X stand for 
any measure of the series X a , X b , X c . . . X n , 

but when it is necessary to distinguish between 
two such measures we require a different notation 
and shall let X ; stand for any of the X’ s and Xj 

any measure not X. and we shall let Xj « represent 
any measure including X,. We clearly desire, as 
a comprehensive measure of variability for the 

TABLE VI A 


ALL POSSIBLE DIFFERENCES FROM 
A SERI ES OF N MEASURES 


x-x 


V*, 


x-x n 

a n 




*c -*a- 


■ V*b 

■ V 


X r ~ X n- • 

c n 


||1 LLUSTR AT I V E NUMERICAL 
DATA 


x n -x n 

n n 


8-8 

5-8 

2-8 

1-8 

8-5 

5-5 

2-5 

1-5 

8-2 

5-2 

2-2 

1-2 

8-1 

5-1 

2-1 

l-l 
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entire series, some function of all possible 
differences' (X.-X.). There are N 2 differences in 

Table VI A, but N of them, being null, do not 
count. We may thus write down the following 
generalized measure of variability: 

n 2 ~ n 

^ I ("Aj - A|0| a generalized measure 

f — of variability [ 6 1 0 1 J 

n 2 -n 

For typographical reasons N when a superscript 
or subscript is in lower case, but otherwise it is 
written as a capital . The n 2 ~2 over the 2 symbol 

indi cates that there are N 2 -2 terms in the sum- 
mation. I f nothing is wri tten over the 2, the 
reader is to understand that there are N addends. 

This function, f, is worthy of investigation 
for different values of There may be beauties 
in measures of variability as yet undiscovered 
and unreported, but the writer has found the 
function rather intractable for all values of m 

except m- 2. For this case \(X^ -X . ) | 2 = (X. ~X. ) 2 

and the function has very simple and remarkable 
properties . The function, in this case,, is a 
variance, for, by definition a variance is the 
mean square of quant i ties whose mean is zero . 
(A s tandard deviation is the square root of a 

variance . ) Since (X.-X.) ~ ~(X .^~X ] ) and for 
every (X- X. ) in Table VI A there is in some other 
position a (X~ X. ) $ it follows that the mean of 

the differences in Table VI A r 0 and accordingly 
the mean of their squares is a variance. We may 
write 

1 n 2 

Vd = 2 (X.-X , ) 2 

N. z -N 
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The expression Vd stands for the variance of the 
differences and not for V times d . The symbol 
V d is commonly used and is appropriate, but vari- 
ances enter into formulas so frequently that it 
is advantageous to have them represented by a 
symbol without subscript or superscript, thus 
V(d) is typographically simpler than its identi- 
cal equal V d or cr^. The summation term is ex- 
actly the sum of the squares of the differences 
given in Table VI A. For the first column we 
have the magnitudes given in Table VI B. 


TABLE 

VI B 

SQUARES OF DIFFERENCES IN 

ILLUSTRATIVE NUMER 1 CAL . DATA 

FIRST COLUMN OF TABLE Vi A 

- H.X. + X“ 

64. - 2 X 8 X 8 + 64 

% - 2*A + X 2 

64-2X8X5 + 25 

X a " 2X a X + X 2 , 

64-2X8X2 + 4 

• . • 

64-2X8X1+ 1 

X 2 - 2X X + X 2 

a an n 

Summing these we obtain 

the first row of Table 

VI C. The remaining rows 

being the sums corre- 

sponding to the second,. 

third, etc, , columns of 

Table VI A. 

TABLE 

VI c 

THE Dl FFERENCES OF TABLE 

ILLUSTRATIVE NUMERICAL DATA 

VI A SQUARED 

- 

AX a -2X a 2X. + 2f? 

a a i i 

4X64-2X8X16 + 94 

NXl -2X 2X. + 2X? 

b b * i 

4 X 25 - 2 X 5 X 16 + 94 

lVX 2 -2X c Sf. +2X2 

4 X 4-2X2X16 + 94 

• • 

4 X 1-2X1X16 + 94 

<-2X„A +5Xf 
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The sum of all these yields A IX?~2(X. ) 2 +N EX? 
yielding finally 


K =' 


2 [A EXf-(EX.) 2 ] 


A 2 -A 


Variance of differences 
between the measures [6 l 02] 

i n a se r I e s 


As, by definition, the mean, M, of a series is 
their sum divided by their number, we may write 
IX - NM and [6:02] may be written 


2{1X 2 -N M 2 ] 


=■ 


A-l 


Variance of differences 

between the measures [6:03] 

?n a series 


As there is no ambiguity the i as a subscript of 
X has been dropped. 

Though no mention of the mean was present in 
the statement of the problem, we find it turning 
up as an essential statistic in ^6: 03] c 

Computation by formula [6:03] is simple, 
whereas it would be laborious by squaring the 
values of Table VI A. 

We may express V 6 in different notation. 
The variance of the X’ s is by definition the sum 
of the squares of their differences from their 
means divided by their number . This follows from 
the easily proven and never- to-be- forgo tten fact 
that X(X~M) = 0. Thus by definition 


V = 


l(X~k 


A 


The variance of a distribution 


[6:04] 


As a simple exercise the reader may prove that 
l(X-M) 2 - IX 2 -A M 2 so that 

EX 2 -N M 2 

V The variance of a distribution [_6:Q5J 

N 

Also by definition 


C — ]/ V The standard deviation of a distribution [6:06] 
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We may now write [6:03] thus 


Variance of differences between 
the measures In a series 


[6:07] 


The most widely used measure of variability 
is V or its square root cr t but, because of the 
factor N/(N- 1) there is not a constant relation- 
ship between V d and V, though it is nearly 


constant if If is large. 


is the preferred 


measure (see [6:08]) because V 6 ~ V 6 but V t V. 

The structure of the symbol V should carefully 
be noted. V stands for variance. Had one the 
true variance, or that for the infinite popu- 
lation of which the N cases are a random sample, 

''V/ 

we would write it V . We never have V except 
hypothetically, for we never deal with samples 
of infinite size. A bar over a symbol has one 
of two meanings, — it indicates either a mean or 

an unbiased estimate . Thus V is an estimate of 
the population variance. The tilde to represent 
a population value and the bar to represent an 
unbiased estimate or a mean will be used through- 
out this text . 

Though the variance, V, obtained from the 
sample, is not an unbiased estimate of the popu- 
lation variance, the variance of the differences 
given by the sample, V d , is an unbiased estimate 
of the variance of the differences that would be 
found in the infinite population. 


Unbiased estimate of the variance of 
the differences between the measures 
in the population 


[6:08] 


The unbiased estimate of the population vari- 
ance is a function of the sample variance and 
the number of cases in the sample , thus, 

T 7 - N xr — ] Unbiased estimate of , 

y ” v ~ — population variance L6:09J 


[6:093 


^ 1 
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Letting V and V 6 be the means of the V 9 s and 
of the V d 9 s found in an infinite number of simi- 
lar samples, [6:09] becomes [6:10] showing the 
relationships between true measures of vari- 
ability. 

? = “ ^ = W = K? [6:10] 

Recapitulating the development we note (a) we 
started with the elementary concept that the 
variability of a series of measures is a function 
of all the possible differences between them; 
( by that the algebraically simplest function of 
this sort is the one yielding the variance of 
the differences between the measures; ( c) that 
the computational procedure for obtaining this 
calls for a new statistic, the mean; (d) that 
V 6 is 2N/(N~1) times the variance, V , of the 
sample, which is a universally employed statistic; 
(e) that V d itself is an unbiased statistic 
requiring no modification depending upon the size 
of sample employed; ( f) that one-half V d , or its 
equal NV/(N-\) , is an unbiased estimate of the 
population variance. 

section 2. DEGREES OF FREEDOM 

Because of the simplicity with which it fol- 
lows from properties of V 6 , the matter of degrees 
of freedom is discussed at this point. If we 
have three independent measures X a , X b , X c , 
knowledge of one conveys no information as to 
the values of the others, singly or jointly, so 
we have three degrees of freedom. In Table VI A 
there are N 2 measures. How many of these are 
independent? Obviously we know the exact value 
of the null measures so they provide no degrees 
of freedom. Having X a ~X b gives us no information 
as to , X ~X,. . . X -X nf so there are ex- 
actly W“1 degrees of freedom represented by the 
first column. The first measure in the second 
column, X b “X a , is the negative of X a “X b and is 
thus not independent. The third measure, X b ~X c , 
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is derivable from measures in the first column, 
for it equals (X- % b ) an d is thus not 
independent. Nor are any of the remaining- meas- 
ures in this column. Clearly there are no meas- 
ures independent of those in the first column to 
be found anywhere in the entire table so, though 
there are N 2 measures in the table, there are 
only A r ~l degrees of freedom. Also we may observe 
that since there are N~ 1 degrees of freedom in 
V d , and since 

JV-1 


V = cr" 


2 N d 


there are N~1 degrees of freedom in V and also 
in cr. 

In the computation of V we had V- [^(X-M) 2 ] /N 
and we noted that this is a biased statistic, 
but if we divide the summation by the number of 
degrees of freedom instead of by N, we have 

V - [I(X~M) 2 ] /(fl~l) which is an unbiased sta- 
tistic. In general, summations are to be divided 
by the number' of degrees of freedom, not by the 
number of cases, to obtain unbiased statistics . 

Is the mean an unbiased statistic? M = (X a + 
X b + X c . . . X n )/lf. Since the various X 9 s are 

independent, there are just N degrees of freedom 
and M is unbiased. We may write 


= H 


An unbiased estimate of 
the population mean 


[ 6 : 11 ] 


Certain statistical manipulations deduct more 
degrees of freedom than do others. For example, 
if we take differences of differences we have 
2 degrees of freedom, though the number of 
such is when we include the null values, and 
it is J\C-2/V 5 + f/ when they are excluded. 

The student of statistics will find that this 
matter is important in connection with small 
samples and in connection with sundry derived 
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statistics wherein algebraic restrictions have 
been imposed upon the data, 

section 3 . THE VARIANCE AND THE STANDARD DEVIATION 
OF THE MEAN 


We found that V d and V are interchangeable in 
the sense that having one, and the number of 
cases in the sample, we can immediately obtain 
the other. Thus either may be taken as a measure 
of variability of the data. Just as variability 
of the data is its chief phenomenon, so vari- 
ability of the mean (not its absolute value) is 
its prime characteristic. What matters the value 
of a mean if it is so quixotic that it cannot be 
trusted? We shall now obtain the variance of 
the difference between means and the variance of 
means. If K is the number of samples, we have K 
means and obviously (see [6:07]) 

2 k 

V(M.-M ] ) =— • [6:12] 


and as the number of samples approaches infinity 
we may write 

V (M-M.) - 2 F m [6:13] 

The error in any single mean, say M x , is M ± ~M . 

''V/ 

We call this and write A L = M 1 -M. Clearly 

^A./K tends toward zero, for each M . is an un- 
biased estimate of M. By definition of a variance 


Hi- Ta = - 


k 

2 A 2 


K 


If M is subtracted from each X in Table VI A, 
the values of the differences are unchanged for 

[('X. ~M)-(X- i ~M)] = X.-X.. We now define x, a 
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score as a deviation from the popul ation mean, by 

'V- ' V 

X — X~~M a dev i at lon-f rom-the-t rue-mean score [6 : 14] 
For the first sample of N cases we have 



and similarly for successive samples. Since 
x. -x. s X.-X. we may substitute in Table VI A and 

continue the development as before, obtaining 
the first equation of Table VI D as the compara- 
ble formula to [6:03]. 

The subscript 1 has been added to indicate 
that measures from the first sample only are 
involved. For the K samples we have the equations 
of Table VI D. 

TABLE VI D 

GIVING VARIANCES OF DIFFERENCES W^WeT^MeTfOR 
K SUCCESSIVE SAMPLES OF N EACH 

(N-l) V 6 = 2 2 x 2 - 2 N A 2 

(N- 1) V d = 2 2 x 2 - 2 N A 2 

(N-l) V 6 = 2 2 x 2 -2 N A 2 

* • . 

(N-l) ^ = 2 2 xl - 2N A 2 k 

We shall now sum these equations, as K approaches 
infinity, first noting three simplifications (1) 
that each of the V, 1 s is an unbiased estimate of 

V d and so clearly their average approaches V d , 


(2) that 2 x 2 + 2 x 2 + . . . + S x 2 = £ x 2 which 
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approaches NK V, and (3) that Ia 2 
KV m . Thus we obtain 

(N~1)K Vd = 2NKV- 2NKV 

fT) 

Utilizing the relationships V d 
V = [N/(N- l) V, we obtain 


V_ = 


V_V 
N ~ N-l 


approaches 


2 V and 


True variance error of the mean. 


[6:15] 


If in place of the unknown V we use an unbiased 
estimate of it, we have the accompanying highly 
serviceable formula, in which V is the usual 


and shorter notation for 

V 


Variance error of the mean. 


[6:16] 


v=- 

m N-l 

It would be more accurate to designate this as 
the "approximate variance error of the mean, " 
but as every variance error, or standard error, 
formula based on the observed statistics is ap- 
proximate, we omit the word "approximate, " though 
the student should always be aware that a sample 
estimate has been employed in lieu of the unknown 
population statistic. 

Noting [6:12], we may also write 

Variance error of difference 

M \ between means from samples of. ...... [g. ^7 1 

r ( JW . mm In . ) — n each from a common parent 1 vf * 

" population. 

As the square root of any variance error is a 
standard error, [6:16] yields 

EL Standard error of the mean.... [6:18] 

m vn-i 

By a slight extension of [6:17] we have for two 
means, based upon samples of different size and 
drawn from different parent populations: 
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VCMi-Mt) 



Variance error of 
difference between 
independent means 


[6:19] 


Knowledge of ^ or cr H is necessary to an under- 
standing of the mean. Those who interpret means 
without knowledge of their standard errors assume, 
albeit unconsciously, that o' M is small, for other- 
wise the mean is untrustworthy. For every refined 
judgment and for every debatable issue this es- 
sential item must not be assumed, and it need 
not be, for [6:18] is available and simple to 
use. 

The reader may frequently find formulas for 
the preceding standard and variance errors having 
N instead of N-l in the denominators. This pro- 
cedure is not recommended Qfor V is a better 
estimate of V than it is of V) though no error 
of interpretation is likely to result for samples 
even as small as A =12. A standard or variance 
error is never needed to more than two signifi- 
cant figures, and a one-significant- figure answer 
usually suffices. 

section 4 . SAMPLE AND POPULATION MOMENTS 


Since the mean and variance are so mutually 
cooperative in revealing stable features of a 
distribution, let us consider a computational 
method yielding both. The standard method of 
computation provides the basis for getting higher 
moments as well. We define these as follows: 


l(X-P) k 

N The k*th moment from the fixed point P [6: 20] 

P may be any point not dependent upon the values 
of X in the sample. A common case arises when 
P ~ 0, If, in this case, k " 1 , then obviously 
[6:20] yields the mean. In other words, the 
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mean is the first moment from the zero point of 
the scale of measurement of the X scores. If we 
replace P by the variable point if (variable be- 
cause the mean changes from sample to sample) we 
have the k ' th moment from the mean, which is 
usually referred to as simple the k’th moment. 
If moments from some point P, other than M, are 
involved, they must always be designated moments 
from P. 


7 * 


m 


A 


k'th moment 


[6:213 


We note that the first moment = 0; that the sec- 
ond moment ~ V, which may be designated fi 2 ; that 
the third moment, designated fi^ , equals zero in 
a symmetrical distribution, but not otherwise, 
and thus is a measure of asymmetry; that the 
fourth moment, fi ^ , measures a variability charac- 
teristic unrelated to asymmetry and different 
from V(= jx 2 ) in that it is more subject to the 
influence of extreme measures than is V . This 
phenomenon, measured by the quotient p^/V 2 (see 
[7:03]) is called the kurtosis of a distribution 
and is high when the distribution has long tails 
and a pronounced mode. Otherwise expressed, if 
the parts of the curve intermediate between the 
mode and the tails are called the hips, low , 
intermediate , and high hip heights correspond to 
leptokurtic, mesokurtic, and platykurtic distri- 
butions . 

Higher moments beyond the fourth will not 
here concern us, though we will state without 
proof that the variance error of any moment 
depends upon moments up to one twice as high as 
itself. We have already found that the variance 
error of the first moment, the mean, depends 
upon V, the second moment. The variance of V 
depends upon moments as high as and similarly 
for higher moments. An unhappy feature of the 
higher moments is thei r instability, as is im- 
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mediately obvious from the fact that a single 
measure, if extreme, will greatly affect their 
values. 

In moment notation the /z’s stand for moments 
fromthemean, thus ~ 0, a quantity that never 
needs to enter into a formula. However, the mean 
does enter in, and it has been customary to 
represent it in moment notation either as /j . 1 or 

The notation here used Is related to that of Fisher (1925 et 
seq* andl928J and to that of Yule and Kendall (193?) as follows: 


AS HERE USED 

AS USED BY FISHER 

AS USED BY YULE 



AND KENDALL 

N 

n 

A 

X 

X 

X 

€ 


£ 

Arb. or 


A 

X 

at times(X-Y), 

X 


a t ot he r 1 1 mes X 


1^ 

II 

ii 

V 

in 

II 

II 


m 2 =q2 

A 

n 


V ~N-1 V 

z n- 1 m 2 



m 3 



n\ 

ft 

'X 



M 

4 =k i 


V 

^2 ~ K 2 






^3 = *3 


'X- 

ft 

f\ ^ 3 /<2 



It will be noticed that all theoretical, or 'population. .sta- 
tistics are here designated by tildes and that Fisher repre- 
sents them by Greek letters. Fisher's sample k statistics are 
unbiased est tmates of the popul ation Kstati sties and are closely 
related to Thelle’s semi nvar I ants. 
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fix' or m 1 ' . We shall avoid these and represent 
the mean by M . 

Except the mean, none of the raw moments are 
unbiased statistics. The accompanying fo rmu 1 as 
show the extent of this bias for all moments, and 
combination of moments, up to the fourth degree. 
A vinculum over a statistic, or product of sta- 
tistics, indicates a mean value, obtained by 
averaging for very many samples of /V each. 


M = M [6:22] 

M 2 = f 2 + ^ . . . [6:23] 

N 

— ~ U MV 

M 3 = M 3 + J^ +3 -L [6:24] 

IS * N 


f\j *\j *\j *\j Oj 

•2 su2 1 


jr % Z, + 3(N-1)V 2 6 M 2 V 4 M fi* 

M 4 = M* + -£-* 1 — + + — £-2_ [6:25] 

/V 3 N N 2 


V = 


y- 1 

N 


See [6:10] 


/2 - 


(y-i) : 

A 3 


fin 


3/V 2 -5/V +3 
A 3 


[6:26] 
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CA-1XA 2 ~3A+3J 

A 3 


~ f-t-n 


3(N-1 )(2N-3) 


V 2 


A' : 


[6:283 


_ CA-1^ ^ A-l 

MV = I V + X 


A 


A 


[6:293 


A-l 


A- 1 


M 2 'V= Jf 2 ^ + [2 NM JL+CA- 3;V 2 ] [6:30] 

A’ A 3 




(N-l)(N-2) 


A" 


M 3 + ‘ 


CA-1XA-2; 


A 3 


Cm,- 3V 2 0 [6:31] 


It is to be noted that for a normal distribution, 
in which £,= 0 and/Xr 3V 2 , most of the preceding 

formulas simplify greatly. After the student has 
studied correlation he will find the last three 
formulas useful in obtaining the correlation 
between mean and variance, between mean-squared 
and variance, and between mean and third moment. 

From formulas [6:22], [6:10], [6:27], [6:28] 
and [6:26] we compute the following unbiased 
estimates of the first four population moments: 


An unbiased estimate of the population 
mean, see 


[ 6 : 11 ] 
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[6:09] 


[6:32] 


[6:33] 


For problems involving the combinations of 
statistics (especially if of the fourth or higher 
degree in X) from different samples, the student 
will find substantial economy in technique and 
thought by employing Fisher’s 0-928 and 1934) k- 
statistics. The first four A-statistics are 
given herewith, in which ^ and /j.^ are as here 
defined and not as defined by Fishef (see foot- 
note Ch. VI, p. 212). 

k x = M 


N 2 

k = ^ 

(N-l)(N~2) 

k ^ = - ( N+l)/JL ^ Fisher's 

(N-l)(N-2)(N-3) 

-3 (N~l)V 2 [6:37] 

These k - statistics are unbiased estimates of 
Fisher’s kappa, or population, statistics k ± , k 2 , 

, and , called cumulants. 


Fisher's ^ [6: 34] 
Fisher's k 2 [6:35] 

Fisher's k^ [g ; 36] 


^ jy Ab unbiased estimate of the population 

y = y variance, see 

N-l 

— jy 2 An unbiased estimate of the ' 

n, population third moment 

H"* 

(N-l)(N-2) 


[(N 2 ~2N+3)fi ~3(2N~3)V 2 ] 

(N-l )(N~2)(N~3 ) 

An unbiased estimate of the population fourth moment 

section 5. CUMULANTS AND A-STATISTICS 
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The relationships between the first four cumulants 
and population moments are : 


M = 

K I 

[6:38] 

? = 

K 1 

[6:39] 

'■u 

*3 

[6:40] 

''Vj 

H - 

Kq. + 3*2 

[6:41] 


Some of the merits of cumulants and /c-statistics 
can be surmised from the following quotation 
from Fisher (1934, p. 22): "it is to [Laplace] 
we owe the principle that the distribution of a 
quantity compounded of independent parts shows a 

whole series of features mean, variance, and 

other cumulants which are simply the sums of 

like features of the distributions of the parts." 

In an important sense all the moments beyond 
the first are measures of variability, but they 
measure quite different phenomena and we shall 
follow the custom of referring to the standard 
deviation, the variance, or some comparable 
measure, when speaking of variability , and employ 
the terms skewness and kurtosis for such special 
aspects of variability as are revealed by asymme- 
try and height at the hips. 

section 6. THE COMPUTATION OF THE MEAN AND OF 
MOMENTS 

Let us now consider computational methods for 
getting the moments. Raw scores, designated X* 
scores, and deviation- from-mean scores (sometimes 
abridged to "deviation scores"), designated x- 
scores, and standard scores, or deviation- from- 
mean-in- terms -of- the- standard- deviation scores, 
designated z - scores ( z-x/a ) , have their respec- 
tive merits. (Standard scores are designated 
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x-scores in normal probability tables, for z is 
there used to indicate the ordinate. See [8: 23 j • 
Final outcomes must be expressed in terms of 
scores because they are the observed measures. 
Algebraic derivations are frequently simplified 
by employing x or z scores. None of these well 
serve most computational needs, for the benefits 
of grouping and employing non-fractional values 
are not attained by them. We therefore employ a 
new variable, £, which is a score as a deviation 
from an arbitrary origin in terms of the grouping 
interval, i, employed. We have 



or. 


Relation 


between 


and 


£ scores. 


[6:42] 


or X - Arb. or. + i £ 


If Mg is the mean of the £, scores, we also have 
M - Arb. or. + i Mg g scores [6:43] 

__ . 2 _ . Variability computed - 

V 1 V.g also CT — l & g via £ scores [6:44J 


We also have 

x = i(g-Me) or # + 

^ i 

From [6:45] we easily derive 


Relation bet ween 

gj ui s _ . ' /w g x and f scores 


zf 2 


K = i r < - 1 ( ~T - “i > s 


The variance 
computed v i a 


- 1 


,•3 2,3 

N f /V s 


_ -4 




1 


- 41 


2^3 


[6:45] 

[6:46] 

[6:47] 

[6:48] 


+ 6f| ~Y - 3Jff ) 
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An important property of the arithmetic mean 
readily follows from [6:46] by considering the 
case in which the grouping interval equals one. 
Then 

V = = — - M 2 or £x 2 = Sf 2 - M 2 

immediately proving that £x 2 < Sf 2 whenever Mg, 
the distance from arbitrary origin to mean, does 
not equal zero. That is, the sum of squared 
deviations is a minimum when the point from which 
the deviations are taken is the mean . 

We now have two unique properties of the mean, 
£x = 0 and lx 2 = a minimum , which are the cause 
of its algebraic simplicity and its general su- 
periority as a measure of central tendency. 
Formulas [6:43], [6:46], [6:47], and [6:48] are 
desirable work formulas in that the arithmetic 
may be made simple by an appropriate choice of 
the arbitrary origin and the grouping interval. 
If the grouping is coarse there is a systematic 
error in the even moments (but not in the odd 
moments) due to grouping which, for certain forms 
of distribution may be allowed for by applying 
Sheppard's corrections as given in formulas 
[6:49], [6:50], wherein S V and s /x 4 are the moments 
after applying Sheppard's corrections. 


1 

II 

1 

Ji 

. [6:49] 


12 

Sheppard’s corrections 
to the second and 


i 2 V 7i 4 ' 

fourth moments 


2 240 

[6:50] 


Because of lack of universal applicability and 
because they would interfere with tests of sig- 
nificance, it is very desirable to employ such a 
grouping that the corrections are negligible and 
may therefore be omitted. Sheppard's correction 
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in cr amounts to one per cent when i - . 49 cr and 
the correction in is one per cent when i is 
approximately .25 or . As, for most of the purposes 
of social statistics, the standard deviation is 
the crucial measure of variability and a one per- 
cent error in this measure is immaterial, we will 
in general endeavor to group for computational 
purposes so that i is not greater than .5 or, 
which rule is substantially equivalent to one 
calling for 12 or more classes. The reader will 
appreciate that in situations calling for more 
than common precision and in situations wherein 
the argument depends from higher moments than V , 
either a finer grouping than this should be em- 
ployed or, in cases not involving tests of sig- 
nificance, Sheppard's corrections employed. 

The computation of the first four moments for 
the temperature data of Table IV J, i _ 3° column, 
is shown herewith. 

TABLE VI E 


THE COMPUTATION OF M, V , p y AND p^ 


X 

t 


k 

te 

fg 3 

ft 

99 

2 

5 

10 

50 

250 

1250 

96 

2 

4 

8 

32 

128 

512 

93 

0 

3 





90 

1 

2 

2 

4 

8 

16 

87 

6 

1 

_6_ 

6 

JL 

6 

84 

13 

0 

(26) 


(392) 


81 

23 

-1 

-23 

23 

CO 

CM 

1 

23 

78 

5 

-2 

- 1.0 

20 

-40 

80 

75 

6 

-3 

-18 

54 

-162 

486 

72 

1 

-4 

- 4 

16 

-64 

256 

69 

1 

-5 

- 5 

25 

-125 

625 

66 

2 

-6 

-12 

72 

-432 

2592 


62 


(-72) 

3 O 2 " 

(-846) 

5846 




-46 


-454 



N 



Zg 2 

zg 3 
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=— = -.7419 

N 

v € M | = 4.3205, and = 2.0786 

2^3 2f 2 

he = ~J~ - 3 % “ + 2 4 = 2.7024 

2^3 5^2 

~ ' 4 M S~f + 6 M i ~ " 3 4 = 87.7375 

Since the interval = 3° we obtain 

M - Arb.or. + iM g - 81.7742, which, as ex- 
plained later, should be published as 81.8. 

V = i 2 Vg = 38.8845, published as 39. 

cr = i cr^ = 6.2358, published as 6.2. 

/U 3 = J ' 3 A 4 3 ^ = 72.9648, published as 7 X 10. 

f^= iV^ = 7106.7375, published as 7.1 X 10 . 3 

It is seen that this method of moments, using 
grouped data and an arbitrary origin, has in- 
volved quite simple arithmetic. Also the work 
involved in getting the variance is directly 
contributive in getting the higher moments. 
Though this method is universally serviceable, 
some prefer a method based upon summations, for 
adding machines are more frequently available 
than multiplying machines. We illustrate a sum- 
mation method, Elderton (1905-6), in Table VI F 
which, though involving larger figures than the 
method of moments, is applicable to grouped data, 
involves positive summations only, provides a 
check upon all summations as values appear twice, 
involves only such additions with totals and 
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subtotals as can be printed on an adding machine 
tape, so that the large numbers involved have 
constituted little mental tax. 

TABLE VI F 

THE COMPUTATION OF M, V, ^ , AND /^ , BY MEANS 
OF SUMMATIONS 


X 

f 



Si 

s 2 


5 , 

99 

2 

II 

22 

22 

22 

22 

22 

96 

2 

10 

20 

42 

64 

86 

108 

93 

0 

9 


42 


192 

300 

90 

1 

8 

8 

50 

156 

318 

618 

87 

6 

7 

42 

92 

248 

596 

1211 

84 

13 

6 

78 


418 

1011 

2258 

81 

23 

1 

5 

i is 

285 

703 

1717 

3975 

78 

5 

4 

20 

305 


2725 

6700 

75 

6 

3 

19 

323 

1331 

1056 : 

10756 

72 

1 

2 

-2 

325 

1656 

5712 

16168 

69 

1 

1 

1 

326=S 1 

I982=S 2 

7691 = S- 

21 1 6 2 =5^ 

66 

2 

62 -N 

■ 

326 

i 

1982 

7694 

21152 



A brief examination of Table VI F suffices to 
reveal the mode of computation. The various 
summations from the arbitrary origin (here 66) 


1 

are immediately obtainable from S x , S 2 , 

, from relationships herewith 

S 3 , and 

V 

= s. 

[6:51] 

f 

2^' 2 = S 2 

[6:52] 


2£' 3 = 2 S 3 - S 2 

[6:53] 


= 6S, - 6 S 3 + S 2 

[6:54] 
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Having the various sums of powers of Z' substi- 
tution in [6:43], [6:44], [6:46], [6:47], and 
[6:48] yields the identical statistics already 
obtained by the method of moments, Table VI E, 
The summation method is well adapted to Hollerith 
and Powers machine computation. 

A caution is necessary in using this method, 
or any method, involving large positive and nega- 
tive quantities which nearly cancel. For example, 

3f ' = 216.2258 - 504.2653 + 290.742U = 2.7023, 

which is a five- figure answer though the addends 
are of seven figures. To secure a computational 
accuracy to five figures in the final answer has 
required computational accuracy to seven figures 
in the parts. 

section 7. DECIMAL PLACES TO BE KEPT IN PUBLISHED 

RESULTS 

It is absurd to publish as many figures in 
final answers as have been here employed in the 
illustrative computations. The means of the 
first three temperatures of Table IV B, 80°, 88°, 
and 74°, is 80.666666. . . ad infinitum. The 
best evidence that we have from these three 
observations as to the population mean tempera- 
ture is this answer kept to an infinite number 
of decimal places. Any rounding off whatsoever, 
e.g. , 89.6666667, constitutes a deviation from 
the best possible answer yielded by the data. 
If, however, the confidence to be placed in the 
best answer (which is very small because only 
three observations have been utilized) differs 
from that to be placed in a rounded-off answer 
by a scarcely sensible amount, the rounded-off 
answer serves every purpose. Furthermore, if the 
rounding-off is done just at the decimal place, 
where confidence should flag, the rounded-off 
answer is more informative than the non- abridged 
answer. Following this principle, the mean of 
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the three measures should be published as 81°, 
implying that there is doubt as to the trust- 
worthiness of the units figure, i. e. , of the last 
figure published. 

There is serious doubt as to any magnitude as 
small as one- third a standard error, — the chances 
of a magnitude as great as this arising merely 
as a matter of chance are 74 in 100. We there- 
fore adopt as a practical rule the practice of 
terminating a published statistic with the deci- 
mal place given by the first figure of one- third 
its standard error. This rule, which we shall 
call the "on e-third sigma rule", is equally ap- 
plicable to original observations and to derived 
statistics. To apply it we must estimate stand- 
ard errors where we cannot compute them. For all 
the more common and more important generalizing 
statistics formulas giving standard errors are 
available. 


SECTION 8. VARIANCE ERRORS AND STANDARD ERRORS OF MOMENTS 

Since a standard error is the square root of 
a variance error, it suffices to give formulas 
for variance errors. 


Since the variance of the variance is (V 2 ~V 2 ), 
we may utilize [6:26] and [6:10] and obtain 


K 


N-l 

iV3 


(N-lju^ ~ (N~ 3)V 2 


Vari ance of vari ance 
Isamp les of any size 
and parent popula- 
tions of any form) 


[6:55] 


Since the sample /jl^ and V 2 are unbiased esti- 
mates of ^ and V 2 and not of ^ and V 2 , an an- 
swer in terms of and V 2 , for which we can 


substitute and V 2 wi thout introducing any 
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systematic bias, is desirable. We can derive 
such a formula by utilizing [6:10], [6:26], and 
[6:28]. The final result is 


V 


V 


1 

N( ft' -2) (N- 3) 


l(N-l) 2 ^ ~(N 2 - 3) V 2 ] 


Variance error or sample variance, 
and parent population of any form 


s amp ies of any size 


[6:56] 


When N is large this reduces to 

1 Variance error of 

V = (Ll-V 2 ) sample variance, [6:57] 

N M large 

When the parent population is normal [6:56] re- 
duces to 


2F" Variance of variance 

y — when population is nor- [6:58] 

v iV+1 mal see (13:l l +7) 

In Chapter XIII, Section 8, the variance of 
the standard deviation is derived. It is 

^v Variance of or, 

v a -~r N * a rge [6:59] 

4 V 

When N is large and the population normal this 
leads to 


cr 

vW 


Standard error of cr, 

N large and popula- [g* 50] 
t ion norma l 


Small sample formulas for V and V become 

M 3 M Lj, 

complicated, but the following given by Pearson 
(1902 —1903 and 1914), are frequently serviceable 



1_ 

N 




+ 9 V 3 -6 V 


Variance of 
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[6:64] 


From which we obtain, approximately, 




Variance error of t 
moment, -N large and 
popu I at i on norma I 


h I rd 


[6:62] 





Variance of N large 

From which we obtain, approximately, 


[6:63] 




Var i ance er ro r of r 

fourth moment, -N I arge L6 t 64J 
and population normal 


We may use these formulas to estimate the 
variance errors, and, by extracting the square 
roots, the standard errors, of the several sta- 
tistics computed from the data of Table VI E, or 
Table VI F. 

By [6:18] we find that cr m = .79 and since one- 
third of this = .3, the mean is published as 
81.8, indicating that the .8 is in doubt though 
it is better than a random guess. 

By [6:56], which is the master or most trust- 
worthy formula we have for , we find F v = 94 
or cr v = 9.7. Also, utilizing [6:60] we find 
cr^- .78. We accordingly publish V as 39. and cr 
as 6.2. 

It is informative to see what the approximate 
formulas [6:58] and [6:60] yield. This latter is 
an important formula in that it does not involve 
the third or fourth moments and is widely used. 
By [6:58] we obtain, to two figures, the same 
answers as before, suggesting that [6:58] is 
satisfactory if N is as large as 62. Knowledge 
of the first significant figure of a standard 
error usually suffices to reveal the trust that 
can be placed in a statistic. Formula [6:60] 
yields .56. This value , though definitely 
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too small (because the assumption of normality 
was unsound), is nevertheless much to be preferred 
to no estimate at all as to the trustworthiness 

of C7, 

Not having fifth, sixth, and eighth moments 
we cannot substitute sample moments for the un- 
known population moments and compute the variance 
errors of the third and fourth moments by the 
more accurate formulas [6:61] and [6:63]. We 
shall use the short formulas [6:62] and [6:64] 
knowing that resulting values will be much too 

small. By [6:62] we obtain V - 5783. (to 

m 3 

be published as 6. X 10 3 , thereby revealing un- 
certainty as to the first figure), cr ^ = 76. 

( - 8. X 10), and cr /3 - 25. Though this^standard 

error is undoubtedly much too small, it is still 
probable that the first figure of is better 
than a guess, so we publish it as 7. X 10. By 
[6:64] we obtain V„ = 3597885. (= 4. X 10 6 ), 

o\. = 1897 (= 2. X 10 3 ), and cr /3 = 632 (= 6 X 

10 2 . Though this value be too small, it may be 
that the first two figures of have some merit, 

so we publish it as 7.1 X 10 3 . 

The unreliability here found for the higher 
moments is somewhat larger for these leptokurtic 
data than would be the case with a sample of 
equal size drawn from a normal population, but in 
general the higher moments are unreliable, so 
where possible crucial issues should be made to 
depend from statistics that do not involve them. 

With reference to the reliability of esti- 
mates of the population variance and standard 

rC 

deviation, we note that since V - [N/CN-^l)] V and 

— * _ 'V/ 

cr - /N/(N-l) cr the standard error of V can be 
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gotten by mult iplying Jhat of V by N/(N-1) and 
the standard error of a can be gotten by multi- 
plying that of cr by \/N/(N-l ) . 

Treating the mean, variance, third and fourth 
moments in the same general situation has enabled 
a certain survey of types of statistics. Though 
we shall return again to the most important type, 
measures of simple variability of distributions 
and of derived statistics, the student should 
now have the basic concepts of variability and 
in particular of standard error, which will en- 
able him to appraise the different measures of 
central tendency treated of in Chapter VII. 

SECTION 9. THE AVERAGE DEVIATION 

Though the standard deviation (or its square, 
the variance) is the most serviceable measure of 
dispersion because (1) of the importance and 
simplicity of the algebraic relationships in 
which it is found and because (2) it is generally 
the most reliable, there are many other measures, 
some of which have genuine merit for one reason 
or another. 

The average deviation is nearly as reliable, 
in most situations, as the standard deviation, 
and as it is more readily computed it may be 
serviceable if the data are so extensive that 
the labor of computation is a decisive factor. 
It lacks the simplicity and properties that endear 
the standard deviation to the statistician, so 
when used it should be as a terminal statistic, 
—one not used in further combinatorial processes. 

The average deviation is defined as the arith- 
metic mean of the absolute values of the devia- 
tions of the measures in a series from their 
mean . Thus 

2\X-M\ Definition of f ^ ^ cl 

■ Average Devia- L 6 I 6 5 J 
N tion 


A. D. 
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If the absolute values of deviations are taken 
from some point, P, other than the mean, the 
resulting value must be labeled "average devia- 
tion from the point P. n 


A.D. 


from P - 


2j X-P 
N 


[ 6 : 66 ] 


We have 

x < p x > p 

2|X“P| = 2 (P-X) + 2 (X-P) 


The superscript "X<P " indicates that the summa- 
tion covers all values for which X is less thanP. 
Values of X = P may be thought of as located in 
either of the right hand member summations, for 
these null differences will not affect the out- 
come. 


2 X-P 


X<P X<P x<p x>p 

2 (P-X) + 2 (P-X) + 2 (X~P) + 2 (X-P) 
x < p 


-22 (P-X) + 2 (X-P) 

x <p 

=2 2 (P-X) + NM - NP 


If the number of measures below P is 6, this may 
be written 

2 1 X-P I = (2 b-N)P + M ~ 2 V X 

so that 

A. D. from P = — [(2b~N)P + IX - 2 X | P X][6:67] 
N 

If measures are arranged in a frequency distri- 
bution and their sum, 2X, gotten on an adding 
machine, it is only necessary to get the sub- 

. x<p 

total at the appropriate point to have 2 X* 
The average deviation from P thus is a very 
simple statistic to compute. If the A. D. is 
desired, a single listing, with a few sub-totals 
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in the neighborhood of the anticipated meaa, 
will provide the two summations needed to yield 
both the mean and the average deviation. When 
P = M, formula [6:67] becomes 


A. 



X <M 
2 


X^)The average devi a±?° n [g • 68l 


When P = the median formula [6:67] yields the 
average deviation from the median. This is a 
minimum. Otherwise expressed, the median is such 
a point that the sum of the absolute values of 
the deviations from it of the measures in the 
series is a minimum. The proof of this, which 
is very simple when all the measures in the series 
are different, is left as an exercise. To re- 
capitulate: The basic and unique properties of 
the median are (1) the number of measures below 
it is equal to the number above it, and (2) the 
sum of the absolute deviations from it is a 
minimum. 

The reliability of the average deviation. 
Since the average deviation from an arbitrary 
fixed point is simply the arithmetic mean of N 
absolute values, its standard deviation is given 
by the usual formula [6:18] for the standard 
error of the mean, using the absolute values (Y 
below) as the measures of the series. 

Let X = the original measures of the series. 

Let P = a fixed point, or value of X. 


Let Y = \X~P\ 

Then A. D. from P = 
and 

_ °~y 

^A.D. fro. P -j— 


the mean F - F. y - F - F 


Standard deviation of 
average deviation from 
fixed point P 


[6:69] 


I f a point whi ch is a linear function of the 
data, as is the mean, is the point of reference, 
one degree of freedom is consumed thereby. Thus 
we have for the standard deviation of the average 
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deviation : 

Let X = the ori ginal measures of the series 

Let Z = | X-M | 

Then 

A. D. = the mean Z - Z 
z = Z -Z 

O’ Standard deviation 

_ - 2 of the average [6:70] 

U A.D. , deviation 

/lT2 

Though in general the standard deviation is a 
more reliable measure than the average deviation, 
i.e., or/cr a is greater than A. D./o* A# D , leptokurtic 
distributions exist for which this is not the case. 
In R. A. Fisher (1921) is a basic treatment of, 
"consistent n and "efficient" statistics. 

section io. BASED UPON PERCENTILES 

The distance between any two percentiles, 
Pp-Pp', is an inter-percentile range and is a 
measure of variability, though a very poor measure 
if the two proportions determining the percent 
tiles are near together. It is also a poor 
measure if either of the proportions is very 
near zero or one, for the corresponding per- 
centile is much less reliable in the case of 
unimodal distributions not arbitrarily bounded 
at an extreme than a percentile somewhat removed 
from zero or one hundred. 

The .most reliable interpercentile range is a 
function of the form of the population distri- 
bution. The standard error of P p -P p / is deriva- 
ble from the standard deviation of a percentile 
[4:03] and the covariance (as explained in Chap- 
ter X, Section 5 ) between percentiles. Using 
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c to indicate covariance, p>p', and other symbols 
as earlier defined, it can be shown that 


c (P P 

X P P 


; = N 


i q 1 p Covariance between P 


and P / percent ? I es 


[6:71] 


Utilizing [4:03] and the formula for the variance 
of a difference [10:106] we obtain 


i 2 p q i' 2 p q 1 2 iqi* n* 

v ( p - p . ) = a r — — — — s - £ - ) 

r J f f 2 , t f . 

P p p p 

Variance of an interpercentile range [6:72] 


The most reliable interpercentile range is 
that one for which ( r P p -P p O/° r p -P# is a maximum. 

This can be determined when the form of distri- 
bution is known, and the writer has shown (Kelley 
1921), that, to a close approximation, this is, 
in the case of a normal dist ribution, the dis- 
tance from the 7 th to the 93rd percentiles . This 
can be designated the normal optimal interper- 
centile range , but since it has merit for many 
non-normal unimodal distributions such as are 
commonly found, we will give it a general desig- 
nation and call it Pv. 

A meritorious per- 

p — p p centile measure of [6:73] 

9 3 ~ . 0 7 variability 


Letting p - .93 and p ! = q - .07, the vari- 
ance of Pv is given by [6:72] which then becomes 


.0651 i 2 .0651 i 2 


P V 


= N(- 


.0098 ii 
p q 


/ 2 f 2 / 

• p q i 

Variance of error of Pv 


[6:74] 


A percentile measure of variability that has 
had wide usage is the quart He deviation, or the 
semi- interquart ile range, which is commonly 
designated Q. The reader must distinguish be- 
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tween Q and Q ± , Q 2 , and Q ^ , which are common 
designations of the 1st, 2nd, and 3rd quartiles 
orP #25 > P #50 , and respectively. 


P -P 

.75 • 25 


Quart He [v . 7 ri 
deviation l - 0 : * OJ 


The variance error of Q is given by [6:76] 

N .1875 i 2 .1875 i ' 2 .125 i i' 

V -7 f + 

q 4 f 2 f 2 , f / , 

p 


P P P P 

Variance error of quartile deviation [6.76] 

section li. COMPARISON OF SUNDRY MEASURES OF 
VARIABILITY 

The relative standard error of Q is generally 
much larger than that of Pv, which in turn is 
generally larger than that of A.D. , which in 
turn is generally larger than that of cr . The 
relationships of these measures of variability 
in a normal distribution are shown in Table VI G. 

TABLE VI G 

RATIOS OF CERTAIN MEASURES OF VAR I AB I LITY T 0 THEIR STANDARD 
ERRORS INTHECASEOF SAMPLES DRAWN FROM A NORMAL POPULATION 


4 STANDARD 


SIZE OF SAMPLE TO 

1 . 

CRITICAL 

Y 1 ELD RELATIVE 

;.! ERROR 


RELIABILITY EQUAL 


RATIOS 

TO THAT OF CT FROM 

ij FORMULAS 


A SAMPLE OF 100 

1 ;". .. ' ■ ■ 

j [6:603* 

— = 1. 4142 Sn 

100 

[6:703 * 

a n 

— = 1.3236 vW 

A.D. 

114 



[6:763 


.8573 VW 


Formulas used, except as modified to fit theoretical normal 
d istributions. 
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Table VI G shows that the average deviation is 
nearly as reliable as the standard deviation, 
that the quartile deviation is a very unreliable 
measure, and that Pv, though the best of the 
interpercentile measures, requires 53 per cent 
more cases than the standard deviation to obtain 
comparable reliability in the case of samples 
drawn from a normal population. The formula for 
the variance of an interpercentile range [6:72] 
reveals that the 10th to 90th percentile range is 
nearly as reliable as Pv and may well be used in 
lieu thereof in situations wherein the 10th and 
90th percentiles are already available. 

Where considerations of simplicity or im- 
mediate meaningfulness of statistic has led to 
the use o f percentiles it is consistent to employ 
an interpercentile measure as a measure of vari- 
ability, and either Pv or the 10 to 90 percentile 
range is recommended. In this same situation 
percentile measures of skewness and of kurtosis, 
as given in Chapter VII are recommended, but the 
statistician will realize that all these per- 
centile measures are terminal statistics and 
very cumbersome to incorporate into further comp- 
utational procedures. 


CHAPTER VII 

MEASURES OF CENTRAL TENDENCY 


SECTION i. THE EFFECT OF FORM OF DISTRIBUTION UPON 
DIFFERENT AVERAGES 


Averages: In common usage the term "average " 
indicates the sum of the measures divided by 
their number, but in statistical parlance this 
is the "arithmetic mean, " and any measure what- 
soever of central tendency is an "average. " Ac- 
cording to a common definition any function of a 
series of measures is an average of them if it 
equals them in the special case when they all 
have the same value. This definition is so broad 
as not in itself to guarantee useful properties, 
so we will consider a class of averages which 
are more restricted. 

If f X 2 , . . . X N are all positive and 
are the measures in the series, a general ex- 
pression giving many averages, but not all possi- 
ble ones, is 


f(b) = 


IX b 

N 


1/ b 


A Genera I i zed 
Mean 


[7:01] 


As b takes different values we obtain different 
sorts of means. When b- 1, equation [7:01] yields 
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the most common mean, the arithmetic mean. When 
b-~l 9 equation [7:01] .yields the harmonic mean. 
When b~ 2, equation [7:01] yields the quadratic 
mean. This is not the standard deviation because 
IX does not ~ 0. Whenb^O, equation [7:01] yields 
an indeterminate form, which can be shown to be 
the geometric mean.* 

* By definition we have the following for the geometric mean 

G.M. =(X 1 X 2 X y . . X N j 1/N 


o r 


log G.M. 


£(log X) 
N 


we also have 


irb x? + £ + x$ . . . + XJ 




b — 


sr a i a 2 a 3 


A' 


N 


Since X 5 can be expanded into a convergent series, as follows: 

(log X) 2 


X b = 1 + b log X + b* 


2 ! 


+ higher powers in b 


We can set 


X b - 1 + b log X, as b aporoaches zero. 


Thus 

[ f(b)] b =“[l+b log X x + 1 + b log X 2 + • * • 

+ 1 + b log X.J 

= 1 + b = 1 + b log G.M. = (G.M.) b 

N 

G i v i n g 

f(b) - G.M. , as b approaches zero. 
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When b ~ ~°°, equation [7:01] yields the smallest 
measure in the series and when b - it yields 
the largest measure. In all cases f( b) increases 
as b increases from ~oo to oo ? immediately proving 
that all of the means given by [7:01] are in- 
ternal means, i.e., they fall within the range 
of the measures in the series,- and also proving 
that the harmonic mean is smaller than the geo- 
metric mean, which is smaller than the arithmetic 
mean, which is smaller than the quadratic mean, 
etc. If reasons inherent in, or exterior to, the 
data assert that small measures should play a 
more important part than large measures in de- 
termining the average, it follows that b of 
equation [7:01] should, say, equal 0 or “1, 
rather than 1, — i.e., that the geometric or the 
harmonic mean should be used rather than the 
arithmetic mean. Averages other than those de- 
fined by [7:01] exist, such as the median and 
the mode. 

The averages that we shall consider in some 
detail are the median, the mode, the geometric 
mean, the harmonic mean, and the arithmetic mean, 
which last will frequently be called "the mean. " 

The usual order cf trustworthiness of various 
measures of a distribution: For many distribu- 
tions of quantitative data the features o f great - 
est stability are, in order: 

1. The number of cases, N. This is usually 
fixed by the experimenter and thus has no samp- 
ling or experimental error. If the principle of 
sampling does not fix the number of cases, as 
would be the case if the ages of all the men 
passing a selected spot in a certain hour were 
recorded, then of course there is a sampling 
error in N , but this is not the usual mode of 
sampling. We may therefore generally think of 
the number of cases in the sample as having no 
sampling error. 

2. Some measure of variability, generally the 
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standard deviation. Contrary to the usual un- 
tutored expectation which would place the mean 
first, it is generally a fact that a measure of 
variability has greater reliability. 

3. Some average, generally the arithmetic 
mean. 

4. Some measure of skewness, such as jjl^/ a 

D M 3 

(see [6.47], [6:61], and [6:62]), As/a k s (see 

[7:16] and .[7:17]), (see K. Pearson, 

Tables ), Sk/a sk (see [7:18] and [7:19]), or 
^/cr (see R. A. Fisher, 1934). 

5. Some measure of kurtosis, such as (/3 9 -3 )/cr R 

P 2 . 

(see K. Pearson, Tables), Ku-2 . 7885 /ct k u (see, 

[7:08] and [7:09]), ^ 2 /cr (see R. A. Fisher, 

y 2 

1934). 

6. Some measures of multimodality. 

The order of reliability of statistics here 
given is not invariable for it is affected by 
the form of distribution of the data. Certain 
unusual forms may augment the reliability of 
some of the later measures in this list, or of 
other measures not mentioned, while decreasing 
that of some of the earlier measures. Also the 
list is suspect because the basis of comparison 
of the disparate measures has not been given and 
defended. Is the basis that of comparing per- 
centage errors, or absolute errors, or something 
else? A defense based upon statistical concepts 
thus far presented in this text is impossible, 
so it must suffice to ask the reader to return 
to this issue of the order of trustworthiness of 
different measures as his knowledge about them 
increases. Until he can form an independent 
judgment, based upon his particular data, he is 
advised to attempt so to set up his issues that 
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statistics low rather than high in this list 
provide the requisite information. For example, 
suppose the question is, n Are there more men than 
women of a high level of intelligence?” One 
study might compare averages of men and women, 
another compare variabilities of men and of wo- 
men, and another investigate the kurtosis of men 
and of women. Since each of these bear upon the 
issue and since variability is number two in the 
list, the average number three, and kurtosis 
number five, the initial approach suggested is 
via a study of variabilities of men and women. 
Certainly a study based upon proof of multi- 
modality should be undertaken as a sort of last 
resort, that is, only in case no measures lower 
in the list will answer the issue. 

Common forms of distribution: Charts VII I, 
VII II, VII III, show various Pearson-type curves. 
The equations of these curves will not now con- 
cern us. Where two curves are shown under a 
single- type number, one of them is a rather typi- 
cal curve of this type and the other an extreme 
curve in one direction of the type in question, 
and the letter near the left or right margin 
gives the extreme curve in the other direction. 
For example, the two category type varies from 
the M (for Mendel i an, because of the importance 
in heredity studies of the two-point distribu- 
tion having equal frequencies in the two cate- 
gories) curve wherein are equal frequencies in 
two classes through the curve having unequal 
frequencies in two classes to the limiting case 
where all the frequencies lie in one class. The 
letters M, R (rectangular), N (normal), P (para- 
bolic), E (exponential), and L (straight line) 
stand for curves as illustrated in the first six 
figures of Chart VII I . All curves having a 
single mode not at a dictated upper or lower 
boundary are referred to as unimodal, or "i", or 
"A", curves; all having an anti-mode are called 


;V ; v ~ 


• -J • 



M. Special Case of Type Hu 

er° 


Zero Base, or Zero 
Width Class Interval. 


R. Recharge. 


N. Normal or Gaussian. 

v „ H 
y <rr&?r e 

p\ ”° 1 , ^2 


P. Parabolic. 



E. Exponential: TypeX 


y*y©e°" 


L. Line '.Point of Division 
Between Type KiandlXg 


y=yo(i + |) 


£> 2 S £4 Rsn^e-ato o 


A. Curve drawn is 
Corresponding tc Point $= o , ($*=9, and is 
Slightly Less PeaKed Than PointA Curve 


Normal Curve, for 
Comparison — ^ 


Ranjje from -ec to ©c 


Type 3T-U. 

from R \y=>j-- ^jr n J I 


-a I +a 
Ran<^e from -a to+a 


TypeUL 5eeA. 

% 2 =0, $,= 0, & >3.0 

J'"* (M-ir 

Varies from N to a Curve 
More Leptokurtic 
than A 

Ram^e from -oc to oc 


Type JJL-i 


from R 


Ke-0.|3,-0, 1.8 <&_ <-3.0 
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anti-modal, or "u", curves, and all having neither 
a mode nor anti-mode except at a boundary are 
called /-shaped, or "J", curves. These Pearson- 
tvpe curves seem to cover pretty well the range 
of the simpler phenomena of distribution. One 
omission which should be classed as simple is 
the point distribution. This consists of fre- 
quencies at discrete points along a graduated 
scale. As an example may be mentioned the number 
of children in families, — the frequencies occur 
at 0, 1, 2, 3, etc., but never at fractional 
values. Many point distributions, especially 
those having ten or more classes, may be excel- 
lently handled as one would handle grouped data 
of a continuous variable. Certain cautions will 
be obvious. For example, we may speak of the 
mean number of children per family as 2.30, but 
we hardly should say that the median or the modal 
family has a fractional number of children. (See 
Elderton, 1927.) 


SECTION 2. 


THE MEDIAN 


The computation of the median, or the fiftieth 
percentile, P 5 , and of its standard error has 
.been given in Chapter IV, Section 3. Formula 
: 03] giving the standard error of any per- 
centile is 


//Vpq 


o-p - ' 

fU- — 


r 

p 


See [4:03] 


For the median p-q-. 5 and this formula becomes 

i ' vT 

[7:02] 


Mdn 


2/ ; 


Equation [7:02] may be written 


Elderton's "twisted j-shaped" type is one in which 
the small tail of the j turns down instead of up. 
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a 


M d n 


1 



/w 


[ 7 : 02 ] 


f ' 

p 

The quotient ; — 

i A r 


is the proportionate density 


p 

of frequency per unit interval, 'and as this is 
found in the denominator we see that the standard 
error of the median decreases as the density of 
cases increases. Accordingly, the median becomes 
a better and better measure as the density of 
cases in the neighborhood of the median increases. 
Though, generally speaking, the mean is more 
reliable than the median, this is not so if the 
distribution is sufficiently leptokurtic. Curve 
A, Chart VII I, is a symmetrical unimodal lepto- 
kurtic curve for which, approximately, the stan- 
dard error of the median is equal to that of the 
mean. This is a curve having greater relative 
frequency both near the middle (which increases 
the reliability of the median) and near the ex- 
tremes (which decreases the reliability of the 
mean) than does the normal curve. Another word- 
ing which applies to unimodal curves, is that a 
leptokurtic curve is low at the hips, positions 
intermediate between the central and the extreme 
portions. A mesokur t ic curve , of which the 
normal curve is an example, has medium hip height, 
and a platykurtic curve is high at the hips . 

The more leptokurtic the distribution the 
greater the merit of the median as an average, 
but we must note that a high degree of lepto- 
kurtosis is necessary before the standard error 
of the median becomes less than that of the mean. 
We will shortly provide measures of kurtosis, 
but even without them the reader can get a good 
visual impression of Curve A and then expect that 
if his data are still more leptokurtic the median 
is to be preferred to the mean as a measure of 
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CHART VII III 
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central tendency. 

For all distributions the simplicity of meaning 
of the median and of interpercentile ranges highly 
recommends them for use with popular audiences, 
but only in the case of highly leptokurtic distri- 
butions do they have superior status because of 
reliability. 

An exact and general quantitative relationship 
between the kurtosis of a curve and the relative 
excellence of mean and median is impossible 
because distributions with the same kurtosis, 
[7:03], may be non- identi cal in other respects, 
such as the relative frequency in the neighbor- 
hood of the median. 




a i 
V 2 


Pearson’s measure 

of ku rtos i s [7:03] 


~ 3.00 for mesokurtic distributions, of which 
the normal is an example. Thus for a test of 
divergence from mesokurtosis we have 


&2 3 Critical ratio, providing 

mesokurtosis test based L7 : 04] 

Cr /B 2 u P on f~2 

Pearson (1914) has provided a table from which 
an approximate and entirely serviceable value of 
Cp can be found. 

The obvious and crucial test of excellence of 
mean and median is to compute the standard errors 
of each for the distribution in question and 
assert that the one with the smaller standard 
error is the more reliable . Some practice in 
doing this will provide the student with such 
insight that he can generally tell which is the 
more reliable measure by looking at the graph of 
the distribution. Equations [7:05] and [7:06] 
following are of two leptokurtic distributions 
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having equally reliable mean and median. In ap- 
pearance each deviates slightly from Curve A of 
Chart VI I. Curve [7:05] is one given by the 
sum of two normal distributions and [7:06] is a 
Pearson Type VII curve. 

If in an equation e has an exponent, f(x), 
which is elaborate, it makes for typographical 
simplicity to write n exp f(x) n in lieu of n e ftx) ", 
as has been done in [7:05]: 

y = W27 (K exp * k > exp “^— 1 [7: 05] 


in which k ± and k 2 the roots of 

k=/ 7&T : ± /.5v + .5 - v/.25 + 77 = 1.7320' or .7746. 


The fourth moment of [7:05] is 


24 Utt 2 + 677 + 1 - (477 - 1 ) /I + 477] ct 4 
= = 4. 3333cr 4 

(1 - /l + 477 ) 4 

Thus for this curve /? 2 - 4.3333. 


N 


2(1 + 


. 2 . 8^31 


2.6862 cr‘ 


[7:06] 


The Pearson /? 2 is 11. 7434, but the standard 
error of /? 2 , as computed Py Pearson’s method, 
for this very leptokurtic Type VII curve is in- 
finite. We may use the following as a prelimi- 
nary guide, but not as a substitute for the cru- 
cial test mentioned: 


P 2 > 5 


J3 2 standard suggesting the median as 
a more reliable measure than the mean 


[7:07] 
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A similar standard, based upon percentiles, 
and one having a finite standard error even with 
extreme leptokur tosis , may be based upon the 
quotient of certain interpercentile ranges. Trial 
suggests that for this purpose the quotient be- 
tween a greater and a lesser interpercentile 
range than Pv will serve. Calling the 3 to 97 
interpercentile range Tns (initials of "three- 
ninety-seven”) we have 


Ku = 


P - P 

r . 9 7 VO 3 



Tns 

2 Q 


Percenti f e measure 
of kurtos I s 


[7:08] 


The variance error of Ku is obtainable from 
[7:09] in which p - .97 and p' = .75. 


V(Ku)- N Ku 2 


[ 


f 2 Tns 

p 


2 


t\ Tns 2 


2 i i q 2 

p q ^ 

f / Tns 2 
p q 


i ii ip'q' i i i i p'q' i > i > o /X 

+ p q n t ^ p q r -t p q. 

4/p' Q 2 4/^' Q 2 4/ p ' / q ' Q 2 

J p V <?<?' \ V qq' i p y qp' 

2QTns / f Q - 2QTns / /, 2QTnsf p f p > 

\ y qp\ 

- ~] Variance error of Ku [7:09] 
2QTn s / /' 

For [7:05] Ku = 4.60 and its standard error = 
8.27 //F. We may use the following as a guide: 


Ku > 5 


Ku standard suggesting the median as 
a more reliable measure than the mean 


[7: 10] 


In a normal distribution 
Ku ~ 2.7885 Mesokurtic Ku 
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In view of the extreme unreliability of higher 
moments, and even of the standard deviation, in 
highly leptokurt ic distributions we may conclude 
that when the median is preferred to the mean an 
interpercentile measure of variability , such as 
Pv [6:73], is to be preferred to the standard 
deviation x a percentile measure of skewness, 
such as Sk [7:18], is to be preferred to ^ 
[7:12], and a percentile measure of kurtosis , 
such as Ku [7:08], is to be preferred to 

E7:03] • 

The utility of measures of skewness is dis- 
cussed in Chapter VII, 3, in connection with the 
mode . 

In certain problems the selection of measures 
because of high reliability may have to yield to 
selection because of amenability to algebraic 
manipulation. This latter consideration may 
dictate the use of M, V, , /x 1+ , /? x , fi 2 , or of 

Fisher’s unbiased ^-statistics , k 1 (~M), k 2 (~V), 


k 3 (-ty, - 


N : 


(N- L)(N-2)(N-3) 




t-2 


*1 (el =— ■>. 


k 3 

K 2 


and g 2 



Summarizing the points noted about the mediari, 
we have; 

(1) Its simple and direct meaning recommends 
its use with popular audiences. 

(2) Its standard error can be simply calcu- 
1 ated. 

(3) It is a terminal statistic and seldom 
should be used in further computation for its 
algebraic manipulation is difficult. 


1 
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(4) Its reliability increases as the kurtosis 
of the distribution increases, becoming greater 
than that of the mean for pronouncedly lepto- 
kurtic distributions. 

(5) Though it does not lose its simple mean- 
ing in the case of skewed distributions, its 
greatest utility as a measure of central tendency 
is with distributions which are approximately 
symmetrical . 

section 3. THE MODE 

For distributions which are nearly symmetrical 
the practical issue generally is whether the 
measure of central tendency shall be the mean or 
the median, or whether both shall be used, and 
for skewed distributions it is whether it shall 
be the mode, the harmonic mean, or the geometric 
mean, or whether one of these and the mean shall 
be used. The mean holds a preferred position in 
both of these general situations because of its 
nice algebraic properties. 

The harmonic and geometric means are limited 
to distributions, generally skewed, wherein zero 
and negative values of the variate are impossible. 
No such limitation attaches to the mean, median, 
or mode, but the peculiar merit of the mode as 
the measure of maximum likelihood is most ap- 
parent and informative in the case of skewed 
distributions only. We therefore discuss the 
mode in connection with skewness. 

Measures of asymmetry and of skewness: H'e 
will in this text call a measure of imbalance of 
a distribution a measure of asymmetry if it in- 
volves the units of measurement and a measure of 
skewness if it is independent of them . Thus 
is a measure of asymmetry and A^/cr 3 is a measure 
of skewness, for the former does and the latter 
does not change as the units of measurement are 
multiplied or divided by any constant. Pearson's 
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measure of skewness: 


01 


a; 


4 


Pea r son® 
measu re 


S A. [7:12] 

of sk ewne ss * 


Tables yielding the standard error of for 
different- si zed samples and di f ferent values of 
/3 X and /3 2 are given in Pearson’s Tables for Sta- 
tisticians and Biome tricians . 

Fisher has devised a very similiar measure of 
skewness based, not upon the sample variance and 
third moment, but upon unbiased estimates of the 
population variance and third moment. His measure 
is 


Si 



Fisher's measure 
of skewness 


[7:13] 


A general formula for the variance error of 
is not available, though frequently needed when 
dealing with clearly skewed distributions. Fisher 
(Statistical Methods for Research Workers, 1925 
et seq.) does provide the variance error in the 
case of a sample drawn from a normal population 
and this, of course, is all that is needed to 
t est departure from normality. It is 


6 n rw-i). 


Variance error of 
In case of sample 
d r awn f rom a norma I 


' fl i (N-2) (N+l) (N+3) P°P ulation [7:14] 

This formula is recommended not only by its sim- 
plicity, but because it applies to small as well 
as large samples. 

The critical ratios and k^/cr k are 

serviceable as tests of asymmetry. Formulas 


Not to be confused with Pearson's (Mo-M ) icr wh i ch he designates 
as "skewness", — the mode herein having been determined from a 
f i tted curve. 
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[6:61] and [6:62] give the variance error of p . 
in case N is large. In the special case in whicn 
N is large and the population normal it can 
feadily be shown that 




Variance error of k_, M 

large and population [7:15] 

norma! 


The Pearson and Fisher measures of skew- 
ness suffer from extreme unreliability for ordi- 
nary-sized samples whenever, as is quite common 
in economic and psychological data, skewness is 
associated with considerable leptokurtosis . In 
these situations the simpler tests of asymmetry, 
and k^/cr ^ , suffer for the same reason. 

To serve with data of this skewed leptokurtic 
sort we give the following percentile measures 
of asymmetry and skewness: 


As 


0 7 


+ P 


93 


- P 


5 0 


Percentile measure, - 

of asymmetry L7: 16J 


The variance error of this measure is given here- 
with, wherein p = . 93 and p f - .50. 


=- [pq(~~ + ~~) - 


f 2 f 2 
p q 


f f 
p q 


i 2 , 

p 


f 2 , 

p 


2 qi p' I 'p i q 
-(— +— ■)! 


f f 
p q 


Variance of percentile measure of asymmetry 


[7:17] 


For an abstract measure of skewness we have 



Percentile measure of skewness 


[7:18] 


The variance error of this measure is given here- 
with 
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S k 


_ J_ 
Pv P 


• 2 -2 

1 1 
p q 


Pv l V(As) + As 2 V(Pv) + Pv As [pq(— ~J> 

f 2 / 2 


1 / 1 

+ 2gg'— C— 
/ 1 / 


j Vari ance of per- 

p _ | centlle measure , i q ] 

)\ f of skewness L 1 ' J 


Except in the case o£ fi 19 which is always 
positive, the sign of the other measures, ? 

jjLj/cr3 f g ±t As , and Sk , is positive and the 

skewness is called positive when the long tail 
of the distribution is to the right, i. e. , toward 
high valuesof the variate. The expression 
"skewed to the right" has been variously used 
It can easily be avoided, but if used it is recom- 
mended that it be synonymous with "positive skew- 
ness" as here defined. 

We will illustrate these several measures in 
connection with the distribution of Table VII A. 


TABLE VII A 


RELATIVE FREQUENCIES IN A PEARSON TYPE 111, ^ -1, 



h 

= 4.5 , DISTRIBUTION, FOR 

.2 ct INTERVALS 



X 

f 

X 

f 

X 

f 

Up 

to 






- 

1.8 

.0008 

.5 

.0562 

2.9 

.0035 

- 

1 .7 

.0083 

.7 

.0474 

3.1 

.0027 

* 

1.5 

.0247 

.9 

.0394 

3.3 

.0020 

- 

1.3 

.0450 

1. 1 

.0323 

3.5 

.0015 

- 

1. I 

.064 1 

1 .3 

.0261 

3.7 

.001 1 

- 

.9 

.0784 

1.5 

.0209 

3.9 

.0008 

- 

.7 

.0868 

1.7 

.0166 

4.1 

.0006 

- 

.5 

.0894 

1.9 

.0130 

4.3 

.0005 

- 

.3 

.0873 

2,1 

.0101 

4.5 

.0003 

- 

. 1 

.0817 

2.3 

.0078 

4.7 

.0002 


. 1 

.0740 

2.5 

.0060 

4.9 

.0002 


.3 

.0652 

2.7 

.0046 

Above 5.0 

.0005 
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The various derived statistics herewith given 
have been computed from the exact distribution or 
from Salvosa's (1930) tabulation in .01 cr inter- 
vals so that an exact agreement with computations 
based upon Table VII A is not to be expected. 
We assume a sample of size 100. 


M = 

.00; 

Mo = - .50; P m01 = ~ 1.233; 

F .' 5 0 

= - 

t 0 7 

.164; P 93 = 1.621; = 3.854; 



jC . 0 7 

t 50 

-=2. 

t 9 3 

383; = 11.033; Pv = 2.854 


{ ’ f 

.50 r . 9 3 


F = 1. 00; av = 4.50; F pv =.0 8 49; F v = .020; 

As - .358; Sk = .1254; fi^ = 1.00; /B 1 = 1.00 
V AS =.02506; K sk = .0041’28; ^ = .3600; V fi = .6750 

Critical Ratios 

As Sk ^3 

= 2.26; = 1.95; =1.67; = 1.22 

Cr As °"sk a Cr /3 1 

The decreasing size of these four critical 
ratios is evidence of a decreasing ^efficiency in 
As, Sk , fij and /3 1 as measures of asymmetry. Also 
the less complicated measures 4s and fi^ are more 
efficient than the abstract measures ok and j3 1 . 
The efficiency of measures changes with change 
in form of distribution (See Fisher, 1921) but 
it seems safe to believe that for a wide class 
of skewed distributions As is more efficient 
than (i ^ as a measure of asymmetry . It is thus 
recommended for use in a test of asymmetry, but 
of course it is not serviceable for further 
computational work, for its algebraic combina- 
torial possibilities are very limited. 
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Comparison of mean and mode: As to how skewed 
a distribution should be to warrant the compu- 
tation of mode rather than, or in addition to, 
the mean depends upon (a) which is the more re- 
liable and ( b) whether the distinction between 
them is possible and important for the issue in 
hand. The matter of reliability is accurately 
handled by comparing the standard error of the 
mode with that of the mean. Since there are 
several ways of computing the mode, both it and 
its standard error will depend upon the compu- 
tational method followed. Pearson * s method is 
to fit a curve with few constants (coefficients 
and exponents) to the entire distribution and 
then determine the mode and its standard error 
from this fitted curve. The merit of this pro- 
cedure for distributions which are found by test 
( a X 2 goodness-of-fi t test of advanced statistics) 
well to fit a curve which is defined by just a 
few constants is granted, but the writer ques- 
tions its general validity for psychological, 
educational, economic, and social data. In these 
fields the units of measurement are seldom natu- 
ral, intrinsic, or inviolable. They are kept if 
found useful and discarded if not. Furthermore, 
there is generally no guarantee that the units 
which one happens to use retain a similar or 
constant merit throughout low, intermediate, and 
high ranges of values. We commonly, therefore, 
desire a mode which is not a function of the 
entire data, but only of that portion residing 
in the neighborhood of the mode, A computational 
method yielding such a mode, together with its 
standard error, is. given later in this section. 
That this computational procedure is simpler 
than Pearson* s is not the chief argument in its 
favor, which is that it yields a measure of cen- 
tral tendency which is important in itself, even 
though the entire distribution may not be defined 
and even though the significance of the units of 
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measurement change gradually when passing from 
low to high values. The standard error of the 
mode as thus computed may be less or greater than 
that of a Pearson mode, depending upon the nature 
of the distribution. 

Maximum likelihood: A certain class of sta- 
tistics is known as "maximum likelihood" measures. 
Think of a sample as derivable from any of a 
number of parent populations, all of the same 
type. The probability that the observed distri- 
bution is a sample from one of these is greater 
than that it is from any other one, so this 
particular parent may be called the maximum 
likelihood population. Its definition, i.e., the 
numerical values of the constants, or parameters, 
entering into its equation, is obtainable from 
the observed distribution. A parameter so ob- 
tained is a maximum likelihood statistic . 

A unimodal population will yield sample distri- 
butions, and the observed mode of a sample is 
more likely to have arisen if the sample is one 
drawn from a parent with this same mode than if 
drawn from any other parent. Thus the sample 
mode, defining with maximum likelihood the parent 
mode, is a maximum likelihood statistic. This is 
true when the only assumption about the parent is 
that it has a mode. 

The matter of maximum likelihood can be here 
merely touched upon, but this should suffice to 
inform the beginning student that the modal value 
of any statistic has certain important probability 
properties that are not possessed by any other 
value of the statistic. 

That the mode is a natural point of reference 
is suggested by the simplicity of form that 
certain curves of the Pearson type take when the 
mode is made the origin. The writer has shown 
(1940) that certain invariant properties exist 
for modal age-grade norms that do not exist for 
other norms. Its general and genuine merit is 
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only clouded by the greater mathematical diffi- 
culty of manipulating maximum likelihood rather 
than mean statistics. For the reasons given it 
would seem that the distinction between the mode 
and the mean is an important one to make when it 
can be made with certainty and when an issue of 
central tendency is involved. We cannot relate 
this matter to the degree of skewness of a dis- 
tribution, for a minute skewness is determinable 
if the sample is large enough. The statistical 
establishment of the mode as different from the 
mean is involved whether the mode is calculated 
by a fitted-curve method (in which case facili- 
tating tables are given in Pearson’s Tables for 
Statisticians and Biometr icians) or by the simp- 
ler method here given. It would seem in general 
to be sufficient in the case of unimodal distri- 
butions to anticipate a useful distinction be- 
tween mean and mode when asymmetry, as measured 
by As or fi ^ , is established. 

The computation of the mode: In Chapter IV 
an exercise was given in grouping so that the 
mode shown in a graphic presentation would have 
a known dependability. We here will give an 
algebraic method which is an extension of this 
practice. With fine grouping many modes charac- 
teristically appear in the sample and the crude 
mode has little to commend it over nieghboring 
values. With coarse grouping the possible sample 
crude mode values are few and far apart, so a 
close agreement with the population mode cannot 
be expected. The mo vine- average method of com- 
puting the mode is intended to remedy the defects 
of both fine and coarse grouping. Its process, 
merits, and defects can be illustrated by the 
data of Table VII B. 

Certain of these lengths occur several times, 
but none should be called the crude mode, because 
none is outstanding. We first construct a fre- 
quency distribution, using a fine interval, so 
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TABLE VII B 

HEAD LENGTHS, IN MM., OF 50 SIXTEEN-YEAR-OLD 
DELINQUENT BOYS 


204 

204 

S 84 

195 

184 

183 

192 

1 85 

193 

188 

186 

189 

187 

192 

189 

190 

201 

191 

190 

194 

181 

186 

206 

179 

191 

188 

178 

187 

182 

189 

180 

187 

185 

187 

203 

182 

197 

190 

1 94 

183 

200 

193 

S 96 

184 

186 

190 

197 

187 

190 

194 


small that a computation error of one-half this 
interval, since the moving-average mode must be 
a class index or in the case of equal frequencies 
in neighboring classes a class boundary, is im- 
material in the light of the problem and the size 
of the sample. We cannot choose an interval less 
than one millimeter, but this seems sufficiently 
small. We obtain the first two columns of Table 
VII C from Table VII B. 

TABLE VII C 

FREQUENCY TABLE OF LENGTHS, IN MM., OF HEADS OF 50 
SIXTEEN-YEAR-OLD DELINQUENT BOYS, ALSO FREQUENCIES 
FOR DIFFERENT 2, 3, 4, and 5 MM. GROUPINGS 


HEAD 


FREQUENCIES FOR VARIOUS CLASS INTERVALS 


LENGTHS 

1 N MM. 





5 mm. 

STANDARD 

4 mm. 

177.5 






2 

178 

l 






179 

1 






180 

I 






181 

1 






181.5 





■HIM 

6 

182 

2 



■ 

■Mi 


1 83 

2 



■ 

mi 


184 

3 


7 

■ 
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The term, moving average, suggests the method 
of computation. For example, for the interval 5 
mm. the first three values are obtained thus: 

12 is the sum of 2, 2, 3, 2, 3 

15 is the sum of 2, 3, 2, 3, 5 

15 is the sum of 3, 2, 3, 5, 2, etc. 

The first 15 may be gotten by dropping the 2 and 
adding the 5 to the value 12, the second 15 by 
dropping the 2 and adding the 2 to the first 15, 
etc. 

With the 1 mm. interval there is no mode with- 
out a near competing mode, so a solution is not 
reached. Nor is there a solution with the 2 mm. 
grouping. With the 3 mm. interval a single mode 
is found and its value is 188.0. According to 
the usual procedure this would be taken as the 
answer. The coarser groupings are given to be 
illustrative of the uncertainty of the solution. 
No unambiguous mode is revealed with the 4 mm. 
grouping, but the value 188.0 appears with the 
5 mm. interval. Some follow the practice of 
taking moving averages of moving averages, with 
doubtful improvement in outcome. The standard 
error of the mode computed by any of the moving- 
average methods is unknown. 

We will compute the mode by determining the 
maximum point of the best fit parabola to _the 
frequencies in the four neighboring classes whose 
sum of frequencies is greatest. The data of 
Table VII E shows that thi s parabolically smoothed 
mode is a very serviceable approximation to the 
Pearson fitted curve mode. As one would scarcely 
consider using the mode as a measure of central 
tendency in the case of a flat-topped distribu- 
tion, the failure of formula [7:24] in such cases 
is not important. 

The size of the interval employed is of prime 
importance. We will determine the number of 
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classes and the class boundaries according to 
the rules already given in Chapter IV for graphic 
portrayal. These have, in general, been found 
to be meritorious for the purpose of computing 
the mode. We cannot say that the interval given 
by these rules is the best, for what is best 
depends upon the nature of the true distribution 
for the phenomena in question. Investigation, 
for the normal distribution, of the optimal size 
of interval to use in determining the parabolic- 
ally smoothed mode so as to minimize the standard 
error of the discrepancy between the frequency 
at the mode as given by the parabola and the true 
modal frequency has resulted in the values of 
Table VII D. The computational procedure was 
such that the results are only approximate, 
though the error in the intervals given is cer- 
tainly less than . lcr. 

TABLE VII D 

OPTIMAL INTERVAL, IN TERMS OF THE POPULATION cr, FOR 
THE COMPUTATION OF THE PARABOLICALLY SMOOTHED MODE, 

FOR SAMPLES OF SIZE A' DRAWN FROM A NORMAL POPULATION 

N 22 30 80 125 200 400 1300 

Size of 

interval I. I LO .9 .8 .7 .6 -5 



A comparison with Table XIII H shows that 
these intervals are considerably larger than 
there given. Obviously the more leptokurtic the 
curve the smaller the optimal interval when ex- 
pressed in terms of the standard deviation, and 
as typically the merit of the mode is associated 
with skewed leptokurtic data, the smaller (though 
still wide) intervals of Table XIII H are to be 
preferred to those of Table VII D. 

Having found a practical solution for this 
troublesome matter of size of interval, we will 
now give the necessary formulas and compute: 
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Mo - the parabolically smoothed mode 
cr.. - the standard error of this mode 

M O 

/ Mo “ the modal frequency 
and cr u = the standard error of this frequency. 

MO J 

From the general nature of most non- cuspidal 
unimodal frequency distributions of continuous 
variables, it is inferred that a second- degree 
parabola can be made to fit closely to these 
curves in the neighborhood of their modes. We 
will accordingly determine the parabola that best 
fits the four points closest to the mode to be 
determined. Let us use an arbi trary scale, called 
a £ scale, in terms of which the values of the 
class indexes of the four classes in question 
are -1.5, -.5, .5, and 1.5 respectively, and let 
the frequencies of these classes be represented 
by f 1 , f 2 , f and / 4 . One unit upon this arbi- 
trary scale in g is equal to i units in the 
original data, and the origin of the £ scale is 
the boundary between the second and third classes. 
The algebraic relation between X and £ is 

X = Arbitrary origin + i £ [7:20] 

If the mode is determined in £ units, then its 
value in X units is given by 

Mo y - Arbitrary origin + i Mo g • • * * [7:21] 

The equation of the parabola which best fits 
(in the least squares sense) the four points 
(-1.5, f x ) t (-.5, f 2 ) r ( . 5, f 3 ), (1.5, fj is 

f = ~(-f 1 + 9 f z + 9 f, - + r-.3 f x - .1 f 2 

16 J 

t .1 f 3 + .3 fje [7:22] 
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The equation of the parabola which best pre- 
serves the areas, in the least squares sense, in 
the four intervals is 


' s Ii ( -V+7/, 


+ 7/3 - /,) + (-. 3/ x 


+ .1 



.1 f 2 
[7:23] 


which, it will be noted, differs from [7:22] only 
in the constant term. It thus has the same mode 
as [7:22]. 

The mode, or value of £ for which the fre- 
quency is a maximum, is 


Mo> 


-.6f x - . 2 f 2 + . 2 f 3 + . 6 /, 


-/i + f 2 + 


f 3 ' 


The mode _ _ 

in# [7:24] 

units 


Accordingly, substituting this value in [7:21] 
yields the answer which we will call the para- 
bolically smoothed mode . If the denominator of 
[7:24] is negative the sample has an anti-mode 
in this region, so we have evidence of bimodality 
and should consider the method inapplicable. 

Substituting Mog for £ in [7:23] yields the 
modal frequency per i interval 


-f x + 7 f 2 + 7/3 - f 4 

f per i interval = + 


12 


■/, + f 2 .+ f 


3 


flo ,) 2 [7:25] 


Of course 


L P er j per unit 

(in X) interval [7:26] 

The variance error of Mog is approximately given 
by [7:27] . 
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f +/ +/ +f 

r l -*2 i 3 4 


Variance 


V = ( 9 + ) error of [7*971 

Mo £ ( tf 1 


V = i 2 

y Mo 1 


Mo j 


Variance error of paraboiicalfy 

smoothed mode. ......... . [7 ! 28 ] 


* Calculation of the standard error of the parabolically 
smoothed mode 

We have 


_ ~-6 f i ~- 2/ 2 + - 2/ 3 + >6/ * _ jyum 

^ +/ 2 +f 3 Den 


Taking logarithmic differentials (see Chapter XIII, Section 
8 ) we have 


d Mo ^ d Num d Den 
Meg Num Den 


Squaring, summing, and dividing by the number of samples 
summed, we have 



v 


Num 




Den 


+ 


2c 


Num 


Den 


Mog Num 2 Den 2 Num Den 

f i - 

We let ~ Pi an d Qi — l”Pi • Similar definitions /.old 

N 

p’s and q 1 s with other subscripts. We also let 

p = Cpx+ p 2 + p 3 + )/ 4 


fo r 


Utilizing relationships of the types 

7 f x = NP&I and % °> g r fl f 2 = ~ N P iPz 

^Num = ^C36p 1 q 1 + . 04p 2 q 2 +.04p 3 Q 3 +. 36p 4 c 4 “ 
. 24p 1 p 2 +. 24p 1 p 3 +. 72p 1 p li +. 08p 2 p 3 + 

• 24p 2 p 4 -. 24p,p, + ) ‘=. .8/Vp 
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As a check upon the excellence of the mode as 
given by [7 : 24] and [7 : 21] we may compare it as 
thus computed with the value as determined by 
more refined methods. Volume one of Bicwetrika 
was scanned to find illustrations of data to 
which Pearson curves had been fitted and the 
mode determined from the fitted curves. Articles 
by Powys (1901), Vacdonnell (1901), Pearson (1902, 
Sys.), and Fawcett (1902) serve excellently. 
The data by Powys are particularly valuable be- 
cause of the large size of samples. Testing the 
parabolically smoothed mode against the mode 
from the fitted curve for these large samples is 
a severe test so far as systematic differences 
are concerned. Examination of the results as 
shown in Table VII E reveals no systematic error, 
and it further shows that the chance error is 
adequately represented by the formula given, 
[7:28], for the variance error of the parabolic- 
ally smoothed mode. 


FOOTNOTE CONT I NU ED: 

F o.n = -^Piq' 1 + P 2 g 2 + p3Q3 + P^4 + 2p 1 P 2 +2p 1 P 3 -2p 1 p, 

“2p 2 p 3 +2p 2 p 4 +2p 3 P | ) 

'=. 4/Vp 

Sum, Den = N < • 6 P&~ 2 P ? <? 2 +• ^V?2 + 

• ^p 1 q 3 +0+0 - . 8p 2 p 4 4p 3 p 4 ) 

*=. 0 


Col I ect i ng terms 


MO . 


4/Vp 

Den 2 


(.2 + toj ) 


(f x + f 2 + f 3 + f^) 

r-v f 2 + f 3 ■ 


(,2+Mol ) 


TABLE VII 
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A general formula for the variance error of 
the par^bolically smoothed modal frequency is 
involved. It is simple for the special case in 
which the true value of Mo^ “ 0. We then have 

V(f Uo per i interval) + 49f 2 (N-f 2 ) + 

49£ 3 (N-f 3 ) + + 14/^2+ 14^/3 ~ 2 /*/, - 98 f 2 f 3 

+ lA^/q. + 14 / 3 / 4 .] Variance of f^ Q whenMo^ = 0. [7:29] 

When Mog does not equal zero all we can assert 
is that the variance error of ( f Ho per i interval) 
is greater than the value given by [7:29]. 

section 4. THE GEOMETRIC MEAN 

The use of the geometric mean as an average 
might be suggested by discovery that, for the 
data in question, it was more reliable than the 
other common averages. However, the determina- 
tion of this may be a very complicated under- 
taking. In ordinary practice the logic of the 
situation rather than an objective measure of 
reliability has been the consideration which has 
led to the use of the geometric mean. Though in 
general we believe this has been sound practice, 
the statistician would be more satisfied if the 
use of the geometric mean were fortified by dem- 
onstrable evidence of reliability. 

Data for which the geometric mean seems ap- 
propriate have the following characteristics: 
(a) The raw measures are all positive and are 
deviations from an indubitable and meaningful 
zero point. (b) Thinking is facilitated when 
relative, not absolute, amounts are kept in mind. 

( c) It follows as a consequence that certain 
tests of consistency (e.g., the time and quantity 
reversal tests applied to index numbers) are 
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frequently satisfied by the geometric mean and 
not by alternative averages. In the following 
discussion the reader should think of the raw 
X scores as fulfilling conditions (a) and (b). 

If X a , X b , X c , . . . X n are the measures in a 
series, and if rrX stands for the product of them 
all, we have, by definition, 

G.M. = fax [7:30] 

Taking logarithms we have 

logX a + logX b + . . . + log* 21o gX 

log (G.M.) = — 

8 N N 



where Y is the logari thm of X, Thus corresponding 
to a geometric mean of the X 9 s is an arithmetic 
mean of the Y 9 s and all the error and probability 
properties attaching to the arithmetic mean can, 
by utilizing the one-to-one relationship that 
exists, be attached to the geometric mean. The 

probability that G.M . (the true G.M.) exceeds 
the obtained G.M. by an amount A is exactly the 

'\y 

probability that M exceeds M y by an amount S, 
the relationships Detween quantities being 

Log GTM. = M y ; log G.M. = M y ; A = GJf.~ G.M.; 

S = M y ~ M y 

The form of distribution of the Y measures and 
the variance error of M (~ —Y ) are as simple 

N~ 1 

to determine as the form of distribution and 
mean of any raw measures. This relationship 
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between measures and their logarithms suggests 
the following practice: 

When the geometric mean is used , interpret 
relationships between measures in relative terms , 
but judge of the confidence to be placed in the 
geometric mean via the confidence to be placed 
in its logarithm . 

Confidence as here used is that due to the 
nature and size of the sample. In many, perhaps 
most, situations wherein the geometric mean is 
appropriate a sample does not exist. If the 
population of a city in 1930 is 152760 and in 
1940 it is 186210, we do not say that we have a 
sample of two. We have two observations in a 
time series and these observations at different 
moments of time are not assumed to be two obser- 
vations of the same thing drawn from an infinite 
population. We may ask, "What is the average 
yearly rate of growth?" The ten-year growth is 
33450. We do not divide this by ten and call 
the yearly growth 3345, but assume a constant 
yearly rate of growth. 

Population in 1930 = 152760 

Assume in 1931 it = 152760( 1+r ) 

Assume in 1932 it = 152760(l+r) 2 

etc. 


Assume in 1940 it ~ 1527 60 ( 1+r ) 10 

186210 

Taking logarithms and solving, r = .0200. 


We find an average increase of 2 per cent a year. 

Had we the following data: Mean vocabulary of 
100 I. Q. sixteen-year-olds 15276 words, and of 
110 I.Q. sixteen- year-olds 18621 words, we would 
not have dealt with rates but have said that 
there was an average increase of 334.5 words per 
unit increase in I.Q. We note that which pro- 
cedure is appropriate has depended upon our 
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general knowledge of the situations which are 
considerations outside the numerical values of 
the measures themselves. We may believe this 
approach justified when dealing with statistical 
series which are not similar measures drawn from 
an infinite population. This situation is very 
common when dealing with temporal and geographic 
data. 

The time reversal tes t is of proven import- 
ance in connection with economic indexes. One 
would anticipate its importance in connection 
with fatigue, learning, growth, and sundry socio- 
logical phenomena of a temporal nature, provided 
measurements from a true zero point are avail- 
able. In economics the test requires that if a 
price ( or quant i ty) index based upon a number of 
commodities shows a certain relationship between 
a first and a second date , the reciprocal rela- 
tionship is to be shewn between the second and 
the firs t da te . Consider the following dat a 
covering a stable, 1913, and an inflated, 1918, 
period. 

TABLE VII F 

PRICES AND PRICE RATIOS OF CERTAIN COMMODITIES, 


1913, 1918 



Prices 

1913 1918 

Rat ios 

1913 on 1918 1918 on 1913 

Bacon 

. 1236 

.2612 

.47320 

2.11327 

Pork 

. I486 

.2495 

.59559 

1.67900 

Lard 

.1 101 

.2603 

. 42297 

2.36421 

Arithmetic 
price index 

. 12743 

. 25700 

.49725* 

.49584 

2.05216* 

Geometric 
price index 

. 12646 

.25694 

.492 1 6# 
.492 IS 

2.03188* 


* These values are the arithmetic means of preceding ratios. 

# These values are the geometric means of preceding ratios. 
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The ratio of the price of bacon in 1913 to 
that in 1918, .47320, is the re ciprocal of the 
ratio of the 1918 price to that of 1913, 2.11327, 
so the time reversal test holds for prices of 
single commodities. The arithmetic average of 
the 1913 ratios, .49725, is not the reciprocal 
of the arithmetic average of the 1918 ratios, 
2.U5216, so the time reversal test does not hold 
for these mean ratios. Also the value of the 
mean 1913 on 1918 ratios, .49725* does not equal 
the ratio of the means, .49 584. 

In the case of the geometric mean of the price 
ratios, the time reversal test does hold for 
.49216 is the reciprocal of 2.03188. Also the 
ratio of the geometric price indexes, .49216, is 
identical wi th the geometric mean of the separate 
price ratios. It is left to the student as an 
exercise to prove that this is necessarily so. 

In this illustrative example the three com- 
modities have been weighted equally (each weighted 
1), but clearly if weighted unequally all that 
is necessary is that each commodity be taken as 
many times as the weight, and then, proceeding 
as here shown, we again find the time reversal 
test holding for the geometric means. There are 
other price indexes than the geometric mean for 
which the time reversal test holds, but none 
which are more simple algebraically. 

The ideal distribution of the cases consti- 
tuting a sample in which to use the geometric 
mean is one in which the logarithms of the meas- 
ures are distributed normally. The data of Table 
VII G yield such a distribution. 

Because of the extreme skewness of these data 
it has been necessary to report frequencies for 
unequally spaced intervals. A plot of these 
data as a frequency polygon will be informative 
in strikingly revealing the form of distribution 
for which the geometric mean is most appropriate. 
If logarithms of these X measures to the base e 
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TABLE VII G 

DISTRIBUTION OF 1000 MEASURES THE LOGARITHMS OF 
WHICH YIELD A NORMAL DISTRIBUTION 


X 

ORDINATE 

z 

INTERVAL 

i 

FREQUENCY PER 
INTERVAL ~ 

1000 z 1 

.167 

.0008 

.166667 

+ 

.333 

.0099 

. 166667 

2 

.500 

.0212 

. 166667 

4 

.667 

.0331 

.166667 

6 

1.0 

.0539 

.5 

27 

1 .5 

.0745 

.5 

37 

2.0 

.0848 

.5 

42 

2.5 

.0886 

(e~mode) 

.5 

44 

3.0 

.0884 

.5 

44 

3.5 

.0861 

.5 

43 

4.0 

.0825 

.5 

41 

4.5 

.0782 

.5 

39 

5.0 

.0738 

.5 

37 

5.5 

.0693 

.5 

35 

6.0 

.0653 

.5 

33 

6.5 

.0608 

.5 

30 

7.0 

.0568 

(e 2 = Mdn) 

.5 

28 

7.5 

.0531 

.5 

27 

8.0 

.0496 

.5 

25 

8.5 

.0464 

.5 

23 

9.0 

.0434 

.5 

22 

9.5 

.0406 

.5 

20 

10.5 

.0357 

1-5 

54 

12.0 

.0295 

1.5 

44 

13.5 

.0246 

1 .5 

37 

15.0 

.0207 

1 .5 

31 

16.5 

.0175 

1 .5 

26 

18.0 

.0149 

1.5 

22 

20.0 

.01218 

2.5 

31 
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TABLE VII G (CONTINUED) 



| are found and a distribution made, it will be 

j normal, with a mean of 2.0 and a variance of 1.0, 

and these statistics will have small standard 
J errors. For the X measures the mean and variance 

| would be, of course, very poor statistics. 

{ 

I SECTION 5. THE HARMONIC MEAN 

!' 

i 



By definition the harmonic mean, HM y of a 
series of measures, X, is the reciprocal of the 
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mean of the reciprocals, thus, 

J_ = JLvJL 

HM H ^ X Harmonic Mean [7; 31] 

All X measures must be positive. Let Y ~ 1/X. 
To compute the HM we compute the Y values, obtain 
their mean, and take the reciprocal. Having a 
distribution of Y values we can obtain M ^ , F y , 

and F m by procedures already given. Thus, to 
y 

obtain a measure of confidence in the HM , we 
first secure a measure of the reliability of M . 
For any fiducial limits in HM that we may be 
interested in, there are corresponding limits 
in My and we judge the trustworthiness of M via 
methods already given for testing the signi ficance 
of the arithmetic mean . 

The HM is serviceable in connection with 
^work limit” measures. In the phraseology of 
psychology a test calling for a fixed amount of 
work, and scored on the basis of time required to 
accomplish it, is called a "work limit" test. It 
stands in distinction to a "time limit" test 
which is one having a fixed time limit and scored 
on the basis of amount done. Let us consider the 
case of individuals A , B , and C, able to ac- 
complish 30, 40, and 50 uni ts of work, respective- 
ly, in one hour. If tested by means of a half- 
hour time-limit test, their amount-done scores 
will be 15, 20, and 25. If tested by means of a 
60-item work-limit test, their time- required 
scores will be 120, 90, and 72 minutes respective- 
ly, — showing qui te di f ferent relationships than 
do the time-limit test scores. The time-limit 
data yield an average score of 20 units in30 
minutes, which is at the rate of one unit in 1.5 
minutes. We desire to get the same information 
from the work limit scores, but their mean, 94, 
equivalent to a rate of one unit done in 1.5667 
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minutes, does not give it* For the HM we have 

_L = JL ( _L +± + L) = _L 

HM 3 120 * 90 + 72 90 

Thus the HM time required to perform 60 units of 
work is 90 minutes, which is at the rate of one 
unit in 1.5 minutes, — checking with the time- 
limit test results. 

To determine which mean to employ the student 
must decide for his particular situation which 
mean is the more immediately interpretable,— that 
whose base is constant time (the time-limit test 
score), or whose base is constant product (the 
work-limit test score). Ordinarily, the meaning 
in the reader's mind of a unit of time is both 
precise and comparable from one situation to 
another, but the meaning of a unit of work, even 
when precise, as e.g. , might be one ton of coal, 
one automobile, one ton-mile of transportation, 
is not comparable from situation to situation. 
Thus the HM is suggested as the appropriate aver- 
age to use in a work- limit test in psychology or 
in industry, where a common procedure is to record 
the time it takes a workman to perform a unit of 
work* 

Another obvious consideration is to select 
the average with the smaller standard error. If 
the Y measures are more nearly normally distri- 
buted than the X measures, the HM will generally 
be more trustworthy than the M . 

It was noted that the GM price index yields 
consistent results as changes in the basal year 
are made, and that in this respect there is a 
systematic error in the arithmetic mean index. 
There is a systematic erro r in the HM price index, 
but it is of the opposite sort to that in the M 
price index. As a result, the average (prefer- 
ably geometric) of an M and an HM price index is 
unbiased. 
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PROBLEMS 


Problem 1. Prove that the G.M. of ratios of 
measures in two paired series is equal to ratio 
of the G.M. ’s of the two series. Suggested nota- 
tion follows: 

Let p la = price of commodity a at date 1, and 
similarly for p lb ... P ln , P 2a , p 2b ... P 2n , P 3 , , 
P 3 b ... P 3 „ , etc. Then p la /p 3a = a price ratio of 
commodity a at date 1 to 2. Any average of these 
for several commodities, which we will call Z 12 
is a price index. Problem: to find such an 
^12 

average that ~z ^ 13 * This asserts, for ex- 

ample, that if the price indexes using 1920 as 

the basal year ^1910 1920* "^"1911 1920 f * * * 

1 1 9 3 0 , 1920^ are available, the price indexes 
for any other basal year, say 1930, are available 


for I 


L 1910 , 1920 


‘‘ 1911,1920 


1910,1930 7 » *1911,1930 ~ 7 > efc c. 

1930,1920 -*1930, 1920 


Prove that the simple G.M. of price ratios is such 
an average. Also prove that the arithmetic mean 
of price ratios is not. 


CHAPTER VIII 
THE NORMAL DISTRIBUTION 

SECTION l. THE NORMAL DISTRIBUTION AS 
DESCRIPTIVE OF CHANCE DISTRIBUTIONS 
AND DISTRIBUTIONS OF ERRORS 

Phenomena of great diversity have suggested 
the normal distribution to the mind of man. It 
Inevitably came to the attention of analytical 
gamesters interested in the probability of events 
when coins were tossed, dice thrown, cards drawn 
from a deck, and roulette wheels spun. An im- 
portant relationship inherent in all of these 
cases is that, starting with distributions of raw 
data which are rectangular, or point rectangular, 
one approaches normal distributions of statistics 
which are aggregates or averages. If an unbiased 
die is thrown many, say N, times we approach the 
following distributions: 

Number of pips 123456 

Number of N N N N N N 



which we call a point rectangular distribution. 
This certainly has no semblance to a normal dis- 
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tribution. Consider a still simpler case: the 
number of heads showing when an unbiased coin is 
tossed N times the distribution will approach the 
following: 

Number of heads 0 1 

N N 

Number of occurrences 

2 2 

If we throw 20 coins at a time and compute for 
each throw the mean number of heads, we will 
approach the binomial distribution given by the 

successive terms in the expansion of (.5 + . 5) 20 
which is very nearly normal. Though the distri- 
bution of single items is two point rectangular, 
the distribution of means of such tends toward 
normality. The writer knows of no exception to 
the proposition that the distribution of sta- 
tistics which are means or aggregates of original 
measures , no matter their form , tends toward 
normality if the number of cases entering into 
the mean or aggregate is large . 

The normal distribution is a limiting form and 
the number of paths whereby to approach it are 
infinite. A. DeMoivre (1733) is to be credited 
with being the first mathematician to derive the 
normal curve and compute probabilities based 
thereon. 

At an early date astronomers found that the 
distribution of their angular observations of a 
star, which could only be considered to be errors 
of observation, approached normality as the 
number of observations increased. We have here 
a phenomenon that is normally distributed, though 
it is not an average or aggregate. In general 
we may anticipate that when a fixed point is 
aimed at , the distribution of errors resulting 
will either be normally distributed or that some 
simple transformation of these measures will be 
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so distributed . To cite two cases involving 
transformed values: If lifted weights are com- 
pared with a standard weight the distribution 
of errors, expressed in grams, will not be as 
closely normal as will the distribution of errors 
if the logarithms of weights have been employed. 
Again, if teachers attempt to classify pupils 
according to mental age, attempting, say, to put 
into a single group all those of a certain men- 
tality, the distribution of their errors expressed 
in terms of mental age will not be as closely 
normal as will the distribution expressed in 
sensed difference units. The writer offers 
the hypothesis that all errors of observation 
are normally distributed when expressed in units 
which are intrinsically appropriate to the sense 
organ or faculty making the observation. 

SECTION 2. THE NORMAL DISTRIBUTION AS DESCRIPTIVE 
OF BIOLOGICAL AND SOCIAL PHENOMENA 

Abraham DeMoivre, the discoverer in 1733 of 
the normal curve as a limiting form to the bi- 
nomial ( a+b ) n , also conceived of it in the 
broadest manner for deviations from it were no 
less than fluctuations from the original design 
of the Deity. It was not only descriptive of the 
universe of physics, but also of biology, soci- 
ology, philosophy, and religion, DeMoivre* s 
successors , scarcely held this breadth of view, 
though Laplace later in the same century and 
Gauss, beginning with his Theoria motus corporum 
coelest ium in 1809, both greatly extended the 
concept in connection with mathematics and the 
physical universe . Still later the astronomer 
Quetelet turned his attention to human affairs, 
writing, in 1846, Lettres . . . sur la theorie 
des probabilites , appliquee aux sciences morales 
et poli tique , and in 1871 upon AnthropOmetrie ou 
Mesure des Differentes Facul tes de 1 9 Homme . 
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Herewith is a striking table prepared by 
Quetelet: 


TABLE VIII A 

QUETELET' S TABLE OF HEIGHTS OF 


AMERICAN CIVIL WAR SOLDIERS 


MESURES 
d e 

HAUTEUR METR 1 QUE* 

N0MBRE 

des recenses, 
pr difference 
de hauteur, 
de O m 0255 . 

PROPORTION 
de !a hauteur 
de 1000 i n- 
scr Its mesures. 
085. CALCUL. 

DIFFERENCES 
ent re 

1 es va 1 eur s 
observees 
e t ca I cu I ees 

1,397 a 1,524 . . 

31 

1 

2 

-i 

1,549 

15 

1 

3 

-2 

1,575 

50 

2 

9 

-7 

1,600 

526 

20 

21 

-1 

1,626 

1237 

4-8 

42 

+6 

1,65! 

1947 

75 

72 

+3 

1,676 

3019 

1 17 

107 

+ 10 

1,702 

3475 

134 

137 

-3 

1,727 

4054 

157 

153 

+4- 

1,753 

3631 

140 

146 

-6 

1,778 

3133 

121 

121 

0 

1,803 

2075 

80 

86 

-6 

1,829 

1485 

57 

53 

+4 

1,854 

680 

26 

28 

*2 

1,880 

343 

13 

13 

0 

1,905 

118 

5 

5 

0 

1,930 

42 

2 

2 

0 

1,95 6 [and over] . 

17 

1 

0 

+1 


25878 

1000 

1000 

-28 

+28 


* I NT ERN AT ION ALER STAT I ST 1 SCH ER CONGRESS IN BERLIN, t. II, 
page 748. 


Could any student , if the first to make the 
tabulations shown in Table VIII A, fail to be 
struck with wonder at the evidence of a permeating 
design? Quetelet wrote of this and other data 
( 1871 ): 
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"It is especially noteworthy that human height, 
though apparently having been developed in a for- 
tuitous manner, is nevertheless subject to a very 
exact law; and this is not peculiar to height, 
but is observable also in connection' with weight, 
strength and speed of man, and further in his 
intellectualy and moral qualities. We see one of 
the most wonderful laws of creation in this or- 
derly diversity which is accomplished entirely 
without the intervention of the human will." 
This was both a return to the breadth of view of 
DeMoivre and an immediate stimulus to Francis 
Galton whose studies of hereditary genius became 
enriched with the concepts of normal distribution 
and correlation. 

In Galton * s Natural Inheritance (1889) is 
found a Table entitled "Data for Schemes of Dis- 
tribution of Various Qualities and Faculties 
among the Persons Measured at the Anthropometric 
Laboratory of the International Exhibition of 
1884. ” Galton used the median as the measure of 
central tendency and Q' , a smoothed estimate of 
the quartile deviation, as the measure of varia- 
bility. Using these and the normal distribution 
he derived measures as shown in Table VIII B. 

With the more refined present-day methods of 
testing goodness of fit the statistician could 
show that sane of the traits of Table VIII B are 
not normally distributed, but nevertheless the 
deviation is not great and the major impression 
gotten by Galton, that the traits are normally 
distributed, was a* great advance over that of 
his contemporaries who, except for a few, must 
have held either no view or a decidedly erroneous 
one. 

The nature of the distribution of mental 
traits is, of course a function of (a) the meth- 
od of selection of the group examined, (b) the u- 
nits of measurement employed, and (c) the preci- 
sion, or lack of chance factors, in the measure 
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employed. If there are large chance errors we 
should anticipate that, like errors of observa- 
tion, they would tend to be distributed normally. 
Thorndike and Bregman , (1924), after making ade- 
quate allowance for the normalizing influence of 
chance factors, concluded "that intellect in the 
ninth grade, if measured in truly equal units, is 
distributed approximately in [the normal manner]. " 
This finding pertained to "word tests,” "space 
tests," and a more general test involving "se- 
lective and relational thinking, and generaliza- 
tion and organization" separately, and also to a 
composite based upon nine mental test measures 
given to some 14,000 widely distributed ninth- 
grade pupils. Their composite distribution is 
shown in Chart VIII I. 

CHART VIII I 

COMPOSITE CURVE FOR THE NINTH GRADE 

BASED UPON NINE SINGLE CURVES. THE BROKEN LINE 
INDICATES THE THEORETICAL NORMAL CURVE. 
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In all these cases of normality mentioned, 
and in innumerable other cases which could be 
cited, it is obvious that there has been a se- 
lective and environmental pressure operating. 
Geneticists have commonly secured point, or near 
point, distributions, but their plant and animal 
products are favored and not subject to the laws 
of survival of the plant and animal in the wild 
state. Many, —probably most and possibly all, — 
of the mutative point characters of the wild-type 
fruit fly are noncontributive to survival in the 
wild state. Institutional care of the feeble- 
minded fruit fly, the eyeless one, the wingless 
one, etc., permits survival. In the wild type 
such essential ‘characters as wing-spread, weight, 
etc. , as condition survival betray unimodal sym- 
metrical or slightly skewed distributions. The 
experimental facts of genetics and medicine and 
the observable facts of stable plant, animal, and 
human life suggest that elementary causes (genes, 
types of infection, specific foods , vitamins, etc.) 
which, if they could be singly operative, would 
lead to point phenomena, do in fact so combine 
with innumerable other elementary or complex 
causes as to yield adjustment and survival traits 
which are of the unimodal symmetrical or slightly 
skewed types. A notable exception to this is 
sex, which characteristically yields a dromedary- 
back distribution, one hump for male and the 
other for female. Except for certain mental as- 
sociates of sex and certain non-sex linked char- 
acteristics which are unimportant for survival, 
such as taste sensitivity, psychological inves- 
tigation and experimentation have not as yet es- 
tablished the existence of mental traits of the 
dromedary type, though innumerable researches 
have been such as to permit the betrayal of bi- 
or mul ti-modaiity, if present. The characteris- 
tic distributions of sociology and economics are 
unimodal, as also are most, but not all, of those 


284 


NORMAL DISTRIBUTION 


[ 7 : 31 ] 


of physiology and medicine. We cannot conclude 
that normality of distribution is universal in 
biological and social phenomena, though we may 
confidently expect to find distributions which 
do not differ greatly from it. 

section 3. THE NORMAL DISTRIBUTION AS RELATED 
TO ADVANCED STATISTICAL THEORY 

Every experimental situation reduces to that 
of the significance of some difference. There 
are two aspects of significance, (a) the magni- 
tude and (h) the trustworthiness of the differ- 
ence. Depending upon the problem, the practical 
issue is (a) or (b) or an issue involving both. 
To interpret magnitude some form of parent popu- 
lation must be postulated, and in many situations 
this is the normal distribution. To judge of 
significance some law governing the errors of 
observation and of measurement must be postu- 
lated, and for cogent reasons this is taken to 
be a normal distribution of errors in large sam- 
ples and the appropriate derivatives therefrom 
for small samples. The £-dis t rfbution , shown 
through the courtesy of Dr. Phillip J. Rulon, in 
Chart VII II, is appropriate for interpreting the 

CHART VIII II 


The + -Curve, y* — 
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CHI-SQUARE DISTRIBUTIONS 

FOR VARIOUS DEGREES OF FREEDOM (Aof.) 


*- 



.00 


•SO 


1.00 

dof 


1.50 


* Let X 17 X 2 , . - X. be uneorrel ated, but associated, variables 
11. e., r 12 ? etc. = 0) , each having mean of zero, a standard devi- 
ation of one, and normally distributed, then Curve d.o.f. = 1 is 

the distribution of X^. For this case the degrees of freedom or 

number of dimensions in which variability takes place = 1 and 
the distribution of X-^ is normal. Curve d.o.f. - 2 is the dis- 
tribution of (X^ +X^)/2. Curve d.o.f. = 3 is the distribution 
of iX* +X^ + Xp /3. Etc. The curve for 30 or more degrees of 

freedom i s not, on the scale used, visually distinguishable from 
a normal curve. 
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significance of means* differences of means, and 
of regression coefficients, for small samples, — 
say N less than 15. It is the distribution of 
these statistics computed from small samples drawn 
from a parent normal di stribution. The X ? distri- 
bution v shown in Chart VIII III, is that of the 
squares of independent variables, — all under the 
assumption that these variables separately are 
normally distributed. The highly serviceable 
distribution of the ratio of two variances is an- 
other derivative of normally distributed meas- 
ures. 

Thus, in practical statistics, the normal 
distribution directly serves many situations and, 
by being the parent from which other distribu- 
tions are derived, indirectly serves the field 
of contingency, analysis of variance, small sam- 
ple theory, and still many other fields of ad- 
vanced statistics. 

SECTION THE NORMAL DISTRIBUTION AS RELATED 
TO THE DESIGN OF EXPERIMENTS 

In general, when an experiment is drawn up, 
simplicity and directness of treatment and inter- 
pretation will be served if 

(a) the crucial variable can be analyzed into 
independent rather than correlated parts; 

(b) where correlations exist, regressions, as 
explained in Chapter XI, are linear, not 
curvilinear ; 

(c) distributions are either rectangular or 
normal, not asymmetrical, bimodal , or of 
unusual kurtosis; 

(d) residual errors are normally distributed, 
or, as in the case of t , X 2 , and F, the 
variance ratio, distributed as some func- 
tion of a normal distribution. 
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Not infrequently a mathematical transforma- 
tion of an initially non- normal variable into 
one normally distributed will , at the same time , 
make (a) possible and bring about (b). Thus in 
all four connections the normal distribution 
plays an important and frequently crucial role. 

section 5. DERIVATION AND TABLES OF THE NORMAL CURVE 

There are innumerable ways of deriving the 
equation of the normal curve. Since all of these 
reveal it to be the limit of some process, the 
mathematical concepts of infinite series, limits, 
and of the calculus seem necessary to a complete 
understanding of its derivation. It has fre- 
quently been shown that the successive terms of 
the binomial ( p+q) n yield the distribution of 
events when random samplings are drawn and also 
when errors of observation or of measurement are 
of a chance nature. It has also frequently been 
shown that this distribution, when N becomes 
large, approaches a normal distribution. 

The accompanying derivation, which is that of 
the simplest of the Pearson system of curves, 
nicely illustrates the mathematical simplicity 
of the normal curve. 

If, at any point in a continuous curve y~f (x) 
a tangent to the curve is drawn, the slope, 
or tangent of the angle which it makes with the 
x axis, is approximately Sy/Sx, —the ratio of 
the small change taking place in y attendant upon 
a small change in x. The slope is exactly this 
ratio as the change in x, viz. , Sx, is made in- 
finitely small. The calculus notation is 

- dy 

slope 

dx 

We will observe that the slope properties of a 
symmetrical, unimodal curve, origin at the mean, 
whose tails approach ever more closely to the x 
axis, are that the slope equals zero when x = 0 
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and also when y ~ 0 (which y does when x = co, or 
- oo) . Algebraically, the simplest slope equa- 
tion, called a differential equation, having 
these properties is 


dy 

dx 


- cxy 


[ 8 : 01 ] 


in which c is some positive constant. Corres- 
ponding to this equation expressing the slope 
properties is another equation, its integral, 
giving the relationship between y and x. This 
equation is 

„2 


X 


y - k e 


in which k is some constant and e is 2 . 71828 ..., 
the Naperian base of logarithms. If we now 
choose c so that the standard deviation of the 
distribution is equal to 1.00, and so choose k 
that the total area under the curve is equal to 
1.00, and substitute z for y merely to avoid 
certain ambiguities in notation which would oth- 
erwise arise, we have 

Equation of 

- v 2 

J — & unit normal 

Z — 6 distribution [ 8 : 02 ] 

v'2rr 

The area to the left of any point x is designated 
p and is given by the integral of [8:02]. Thus 


=? 


x 

z dx = J 


-x 

2 


dx 


/2rr- 


N o rma 1 p robab I 1 1 1 y 
of a measure being _ 

less than X [ 8 : 03 ] 


Of course q (- 1 - p) is the area to the right of 
x and is the probability of a measure being 
greater than x. Since the curve is symmetrical 
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it is only necessary to table x and z values for 
p between .5 and 1.0, or q between .5 and 0.0, 
as has been done in Table XV C. For x and z 
corresponding to a p value less than .5 one en- 
ters the table with the complement of p, viz., 
with q, and finds the value of z and, after at- 
taching a minus sign, of x. 

The two most common requirements of practical 
statistics are for values of x and z, knowing p, 
and for values of z and p, knowing x. Kelley’s 
extensive Table of Normal Probability Functions 
(1947) gives directly x and z to eight decimal 
places for an argument of p of four decimal places. 
Entering Kelley’s table with x yields p and z, 
but not as conveniently as, e.g. , does Sheppard’s 
Table (Pearson, 1914) in which the argument is x, 
to two decimal places, and the tabled entries are p 
[called in his notation .5(1 + a )] and z to seven 
decimal places. 

Of the many other tables of the normal curve 
the early and truly colossal table, considering 
the mechanical aids available at the time, of 
James Burgess (1897-98) and the later extensive 
tables of the National Bureau of Standards (1941 
and 1942) are especially notable. 

If we modify [8:02] so that the standard dev- 
iation is a instead of 1.0 and the area N in- 
stead of 1.0 and call the ordinate y, the equa- 
tion is 


y 


N 



a V 2rr 


The norma! equation with origin 
at the mean, number of cases = at 

JV j and standard deviation of cr [8t04] 


If X is a raw score and M the mean, then x-X-M 
and we may substitute X-M for x in [8:04] and 
obtain the equation expressed in terms of raw 
scores. 
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section 6. CERTAIN PROPERTIES OF THE NORMAL 
DISTRIBUTION 

The exponential nature of the normal equation 
would seem* to the uninitiated, to suggest that 
its manipulation would be somewhat involved, but 
this is not the case because the intrinsic rela- 
tionships maintaining in the curve are extremely 
simple, being equaled in simplicity only by those 
inherent in the rectangular and in the Poisson 
distributions, and because very adequate tables 
of the normal probability functions are available 
to the statistician. Just as logarithms became 
a daily device when adequate tables were pub- 
lished, so the normal curve has become such a 
tool because of the tabling of its functions. 

We let be the n* th moment from the raw 

score origin and j± n the n’ th moment from the 
mean. The successive moments of the normal dis- 
tribution [8:04] are as follows: 

0 ° 

£i„ = J y dx = N , the number of cases [8:05] 

—CO 

I °° 

Sometimes jjl q is defined as — J y dx in which 

N —00 

case it of course - 1.0. 

/jl[ - M, the mean. 


1 00 

% = T y x dx ~ 0; 

/Y —oo 


and every other odd moment 
= 0 [8:06] 


_ 2 ® / 2 ~ 

The mean deviation - — J y x dx = / — a 

No / 77 

(see .... [8:20] ). . 


H =~rl y X 2 dx - V - a 2 

N —qo 


[8:07] 

[8:08] 
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Mean |x 3 j =| f y x 3 dx = -a 3 [8:09] 

No Jtt 

1 00 

^ / y x 4 dx = 3V 2 [8:10] 

/¥ —00 

w , 2 /I28 ,, 

Mean |x 5 | =— / cr 5 [8:11] 

JV J 77 


Every even moment is derivable from E and the 
even moment, which is two less than the original 
moment, through the relationship 


M n “ ( n ~l) E 2 » wherein n is even [8:12] 
so that 

= 15 . .[8:13] 

- 105 V* [8:14] 


fi 10 = 735 V 3 , etc 


[8:15] 



Normal ske-wness (Pearson) 


[8:16] 



Norma! kurtosis (Pearson) [ 8 t 17] 


Vi 



Normal Skewness 

(Fisher) 


y 1 is a theoretical value and Fisher's 
estimate of it is called g x . 


[8:18] 

sample 


^4 A measure of normal kurtosis, _ 

y 2 = “3=0, (Fisher) [8:19] 

V 2 
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y 2 is a theoretical value and Fisher's sample 
estimate of it is called g z . 

The equations just given for /jl 0 , /j. lt fju 1 , and 
fJ -2 > hold for all distributions, those for the odd 
moments for all symmetrical distributions, and 
that for jUy for all mesokurtic distributions. 
The normal equation is that particular symmetric 
mesokurtic distribution for which the reduction 
equation [8:12] holds. 

An approximate plotting of a normal curve can 


readily be made 
as follows: 

from the 

values 

at five points, 

x -2.50 

-1.00 

.00 

1.00 

2.50 

z .0175 

.24 

.40 

.24 

.0175 

Ratio of 1 

Z to Z at J5 

the mean 

3 

5 

1.00 

3 

5 

1 

25 


When plotting, the slope is, of course, to be 
made zero at x = 0, and the points of inflexion 
of the curve are at -1 (i.e., -1 c r) and 1 (i.e. , 
+ 1 cr ) . 

The relationships between the standard devia- 
tion and other measures of variability are, in 
the normal distribution, as given herewith: 


A.D. 


J 7T 


cr = .79788456 a 


[ 8 : 20 ] 


Q = .67448975 cr 


[ 8 : 21 ] 


Pv = 93 - 07 = 2.95158206 cr . . [8:22] 

The very slightly more precise normal optimal 
interpercentile range is 

^.9308395 - p . 069 i 60 5 = 2.9641450 a . . . .[8:22a] 


The relative merit of these measures of vari- 
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the magnitude . 67449cr has been called a "probable 
error. If 

P.E. = .6744898 cr ........ [8:24] 

The precision of a statistic may be indicated by 
attaching the probable error to it thus: 14.7 ± 
1.2. An alternative procedure is to give its 
standard deviation, called a standard error when 
the deviation in question is conceived of as an 
error, thus: 14.7 (st.er. = 1.8). Since this 
second procedure is arithmetically simpler, as 
it does not involve multiplication by .67449, and 
is more accurate, as it does not by implication 
assume a normal distribution of errors, it will 
be used in this text. The designation of the 
standard error by the ± sign, though found, is 
definitely bad practice, in that this sign has 
earlier and systematically been employed in con- 
nection with the probable error. 

The probable error and the quartile deviation 
have been confused. Q = (Q^-Q^) / 2, one-half the 
difference between the first and the third quar- 
tiles in any distribution, and P.E.- .67449c- in 
any distribution. Q and P.E. happen to be equal 
in the case of the normal distribution, but in 
general they are not equal and should not be 
used interchangeably. 

We may note three degrees of refinement in 
judging of the precision of a statistic: (a) 
reporting its P.E. and thus by implication assum- 
ing a normal distribution of errors; (b) report- 
ing its standard error, thus not stipulating the 
form of distribution of errors, and ( c) reporting 
the presumptive form of distribution of errors 
(perhaps a t-dist ribution, a chi-square distri- 
bution, or a variance ratio distribution). The 
third procedure is highly meritorious, and es- 
pecially so when small samples (or small number 
of degrees of freedom) are involved, but most 
problems of significance of a statistic are 
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answered by knowledge of its standard error, so 
the student should become thoroughly familiar 
with the computation of the standard deviations 
of all sorts of statistics. 

The ratio of a statistic to its standard error 
is called a critical ratio. 

In medical and chemical situations it is con- 
ceivable that some virus, bacterium, or other 
substance, may have an affect that is not propor- 
tional to the frequency of the substance, but to 
some power of the frequency. This suggests that 
we study the distribution of y r , where r is some 
positive integral or fractional magnitude and y 
is the ordinate in a normal frequency curve. 

N 2 

y - a ex p - ~r - 

2 _ 2a ^ -X 2 

”(cr vCsT 6XP 2 (o-v/737 2 

which is a normal distribution with variance 
.5 cr 2 and number of cases A ,2 /(2 cr Vtt) . 

= (- 75 =)' exp -=±-£- [8:24.] 

cr v Ztt 2 cr 

This also is a normal distribution. 

section 7. STATISTICAL CONSTANTS DESCRIPTIVE 
OF PORTIONS OF A NORMAL DISTRIBUTION 

CHART VIII IV 


296 


NORMAL DISTRIBUTION 


[8:25] 


Chart VIII IV represents a unit normal dis- 
tribution. The magnitude x is a standard score, 
the height of the curve at this point x is z, and 
the proportion of the total area above this point 
is g. The mean of the measures above x. is d. , 
and is approximately the distance labeled d. . 
The mean of the tail above x. is d. and the mean 
of the portion between x | and x. is d. . . Knowing 
q f the magnitudes x and z are gotten from a table 
of normal probability functions. A brief table 
of these functions is given in Table XV C. 

A formula giving d, the mean deviation of a 
tail portion, is readily derived by means of the 
calculus 


1 ® z 

d = J xz dx = — 

<7 x 9 


The mean devia- 
tion of a tail 
portion of a unit 
no rma I d I st r i but j on 


L 8 : 2 5] 


If the tail is greater than one-half the curve, 
the formula becomes 


d =JL [8:26] 

P 

The sign of d is positive if the tail is toward 
the right and negative if toward the left, when 
the x scale is from left to right. 

Having a d, a distance in a unit normal dis^ 
tribution, the corresponding deviation from the 
mean if the standard deviation is or is simply da . 

Let q. be the proportion to the right of x. , 
q ] the proportion to the right of x. , q } . the 
proportion in the interval from x ? to x. , i . e. , 
g l j = ( q^ -g. ) • Since the mean of a sum is the 

average of the means of the parts, each weighted 
according to the number of cases, we have 

d 1 g 1 = d. .('g l -g.9 + d.g., which with [8:25] 
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yields 

_ z f z j 
' j 9,-lj 

II 

N 

1 

The mean deviation of 
a portion of a unit 

[8:273 

q n 

normal distribution 


As employed in this text q is generally a 
proportion less than .5, but in [8:27] it is the 
proportion to the right of the point of dichotomy 
and may be less or greater than .5. When so em- 
ployed the sign of d . . is correctly given by the 
sign of (z.~z.). 

Table VIII D herewith illustrates the trans- 
formation of an ordered variable, — teachers* 
marks A, B, C, D, E, F, — - into quantitative 
values, under the assumption that the talent re- 
presented by the ordered data is normally dis- 
tributed. The mark A corresponds to a position 
1.692 standard deviations above the mean, etc. 





TABLE VIII D 


NftPMAI I 7 1 Mn AM fiPHFPFn V/APIARIC 



A 

11.4 

. 1 14 

.000 

. 192900 

. 000000 

1.692 

B 

34.7 

.461 

. 114 

. 397034 

. 192900 

.588 

C 

32.5 

.786 

. 461 

.291399 

.397034 

- .325 

D 

10.2 

.888 

.786 

. 190478 

.291399 

- .989 

E 

9.0 

.978 

.888 

. 052485 

. 190478 

-1.533 

F 

2.2 

1.000 

.978 

. 000000 

. 052485 

-2.386 


It is frequently desirable to have further 
statistics of portions of a normal distribution. 
We herewith give the calculus statement for ob- 
taining the variance, the third moment and the 
fourth moment of a portion bounded by and x . . 
The calculus formulas are given to illustrate tne 
derivation, but are not needed in the practical 
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utilization of the method. 

The mean of the squared deviations of the q 1 . 

measures in the interval x ] to x. , from the mean 
of the entire distribution is Vi . , and from their 
own mean, of.., it is as given by [8:29] 


7 ?j 


1 r 2 

J x z dx 

q --> i 


i + 


X. z. - x.z. 
1 1 J J 




Mean squared deviation, 
from total distribution 
mean, of a portion of a 
unit normal distribution 


[8:28] 


Accordingly 


i j 


1 + 


x.z. -x.z. 
' ' J J 


q n 



Variance of 
a portion of 
a unit normal 
d Tstributlon 


[8:29] 


By very similar procedures we obtain the third 
and fourth moments of portions: 


^3:lj 


1 J* 

J x 3 z dx 




2 2 

x.z. - X.Z . 

11 J i . „ ,2 

+ 2 a, . 




Mean cubed 
deviation, 
from total 
distribution 
mean, of a- 
portion of a 
unit norma! 
distribution 


[8:30] 


* Having p, q, X, and Z as defined in connection with a unit 
norma! distribution, we have, 

dp = z dx [8:28a], and 
dz = -xz dx [8:28b]. 

The various integrals here needed are then obtained by inte- 
grating by parts and evaluating at the limits. 
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: ? J ^3-SlJ 3r ij +2cf Jj Third moment 


2 2 
X7Z. — X7Z. 
» 1 J J 


o f a port ion To . 01 1 
of a unit LO • ^ J 

no rma I distribution 


+ (2 - 3V. .) d. . - d\. 

X I j I J I J 


1 , J a 

— J x 4 z dx 


x?z. - x?z. 

11 J J 


Mean fourth powered 
dev i a 1 1 on , f rom total 
distribution mean, f" o , 3 2l 
of a portion of 
a unit norma I distri- 
bution 


+ 3 V n 


• : 'j 4 ^3:ij d i] + 6 V lj d f; - 3 d'l 


[x? { z, - x]z. - 4( 


x: z . - x:z.) d. . 

ii j j 7 i j 


+ 6 (x.z. ~ x.z.)df. + 3(x. z. ~ x.z.)] 


+ 3 + K j - 8d? . - 3d?. 

Fourth moment of a portion of a 'unit 
normal distribution 


[8:33] 
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section a. THE SIGNIFICANCE OF DIFFERENCES BETWEEN 
THE MEANS OF TAIL PORTIONS OF A NORMAL DISTRIBUTION 


Let us be given a sample of A cases upon each 
of which has been made a measurement, X, with, so 
far as we can tell, the same precision throughout, 
so that the standard error of measurement , s, may 
be taken to be constant for all measures. The 
variance, due to the error inherent in the tech- 
nique of measurement, of the mean of any sub- 
sample of/ V. cases is s 2 /A; and is not a function 
of the distribution of these A . cases, nor of the 
distribution of the A cases. 

Let the mean of an upper tail portion of A ? 
cases be JD. and the mean of a lower non-overlapping 
tail portion of A. cases be Dj . The difference 

between the means is (D. ~D.) and the variance er- 

? 2 

s s 

ror of this difference is (— + — ) . The criti- 

N. N. 

cal ratio of the difference divided by its stan- 


dard error is (D. ~D. )/( s/~ +—) , 
' J /A. A. 


where the error 


in question is that due to measurement and not 
due to sampling. 

If the sample is normally distributed, we 
readily can find the size of the tail portions 
which make this critical ratio a maximum. It is 
axiomatic that the upper and lower portions will 
be equal. Let the X * s be expressed in terms of 
standard scores. Let x. be the point of dichot- 
omy for the upper tail, q.A the number of cases 
in the tail, and d. as given by [8:26] the mean 
of the tail. For the lower tail the point of 
dichotomy is ~x. , the number of cases g ? A, and 
the mean -d. . The critical ratio of difference 
between means divided by standard error is 
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) 


in which N and s are constants. An examination 
of a table of the normal probability functions z 
and q enables us to find the value of q for which 
this function is a maximum. It is the value for 
which q = z/(2x), i.e. , q = .2702678. 

Thus when upper and lower groups are selected 
in order to bring out most decisively the dif- 
ference between means of the upper and lower, the 
two groups should consist of 27 per cent each, 
if the distribution for the sample entire is 
normal . * 

section 9. FITTING A NORMAL CURVE TO DATA 
The equation of the fitted curve is 

-(X-M) 2 

N ov N 

y = — e = — z [8.34] 

a/2^ 

in which z is the ordinate in a table of the unit 
normal distribution corresponding to the deviate 
x[= (X~M)/cr] . It is frequently best to plot 
this curve for values of X which are class in- 
dexes, because then a precise comparison can be 
made between the computed curve and the actual 
data. The value of y given by [8:34] is the 

* A fuller proof and further aspects of this matter have been 
discussed by Truman L. Kelley. "The selection of upper and 
lower groups for the va 1 1 dat I on of test Items?* d. ED. PSYCH., 
Vol. 3°y no. Jan. 1939. 
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frequency per unit interval. If the data have 
been grouped, using an interval i , the frequency 
called for is 


iy = (i N z)/cr . [8:35] 

We will illustrate the steps in the computa- 
tion, using the New York daily maximum tempera- 
ture data of Chapter IV, Table IV B« The cal- 
culation of the mean and standard deviation, 
using a grouping interval of 3°, has already been 
made in Chapter VI, Section 6. The M = 81.7742 
and a = 6.2358. In Table VIII E herewith we 
give the class indexes, the equivalent standard 
scores, x. , for the class indexes, the corres- 
ponding or equivalent ordinates, z. , in the unit 
normal distribution, these multiplied by Ni/cr 
providing the theoretical frequencies, r . , in 
the classes, which are to be compared witn the 
actual frequencies, /. , as shown. Though a 
grouping of a number of small theoretical fre- 
quency classes is called for, the actual grouping 
here employed for the upper and lower tail por- 
tions is slightly coarser than is desirable, as 
explained in Chapter IX, Section 5. 

X.~# X, -81. 7742 

x. =— i = — — — — = .16036 X. - 13.1137 

J a 6.2358 J 





z. N i 
cr 


z. 6 2X3 

6.2358 


29.8278 z } 


The difference between /. and f. is the cell di- 

r\j i J 

vergence; is the cell square contin- 

gency; and the sum of these for all cells is the 
square contingency, \ 2 . Chi-square is the sta- 
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tistic which provides a test of the goodness-of- 
fit of the observed data to the theoretical curve. 

^ TABLE VIII E 

COMPUTATION AND FREQUENCIES FOR NORMAL CURVE 
FITTED TO MAXIMUM TEMPERATURE DATA OF TABLE IV G 


CLASS 

INDEXES 

X. 

J 

Xj 

Zj 

'■'0 

f. 

1 

'•V/ 

f. 

J 

f. 

j 

<L£L 

?. 

j 

60 

-3.49181 

.000898 

.03 




63 

-3.01071 

.00429 

.13 




66 

-2.52962 

.01627 

.49 

5.59 

4 

.45 

69 

-2.04853 

.04894 

1 .46 




72 

-1.56743 

.11679 

3.48 




75 

-1.08634 

.221 13 


6.60 

6 

. 05 

78 

- .60525 

.33217 


9.91 

5 

2.43 

81 

- .12415 

.39587 


1 1.81 

23 

10.60 

84 

.35694 

.37432 


1 1 . 17 

13 

.30 

87 

.83803 

.28081 


8.38 

6 

.68 

90 

1. 31913 

.16713 


4.99 

! 

3.19 

93 

1.80022 

. 07892 

2.35 




96 

2.28131 

.02957 

. 88 




99 

2.76240 

.00879 

.26 

3.56 

4 

.05 

102 

3.24350 

. 00207 

.06 




105 

3.72455 

.000388 

.01 








62.01 

62 

17.75 


X 2 ”17.75; degrees of freedom- 5; — - 3.55 

d. o. f . 

3.55 _ 

f 5co" and p ~ *^034 

The frequencies of the observed distribution 
for the successive classes are recorded in the 
f • c gl umn > and the theoretical frequencies in 
the f. column. 

This is not the chapter in which fully to 
discuss X . 2 1 degrees of freedom, and the goodness- 
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of-fit test, but it will be noted that since the 
test involved eight class frequencies and since 
the theory accepted three constants determined 
from the data, namely, N, the mean, and the stan- 
dard deviation, * there are eight minus three, or 
five, degrees of freedom. Thus (/f /5)/l is F 5o? 
a variance ratio with five degrees of freedom in 
the numerator variance and an infinite number in 
the denominator. Ey methods given in Chapter IX, 
an equivalent P, the probability that a diver- 
gence as great as that observed may be attribu- 
ted to chance, is found and this constitutes the 
final evidence as to goodness-of- fit. 

It would commonly be stated that the theory 
being tested is whether the deviations from nor- 
mality present in the observed data are of such 

magnitude as to be attributed to chance, if the ob- 
served 62 items were randomly drawn items from a 
normal parent population. This is not quite ac- 
curate, for actually the theory being tested con- 
cerns itself with that infinite number of sub- 
samples of 62 each of a parent population lor 
which the mean is 81.7742 and the standard devi- 
ation is 6.2358. These sub-samples do themselves 
constitute an infinite population and the theory 
being tested is whether the observed sample is a 
chance deviate from such sub-samples. For pre- 
sent purposes the distinction is important be- 
cause it accounts for the loss of three degrees 

of freedom. _ 

For the data in hand, with P - .0034, we see 
that there is less than 1 chance m 100 that the 
theory of normality is tenable. Knowing nothing 
about the source of the data we would draw this 
conclusion, but knowing the source and knowing 
that successive daily maximum temperatures at a 
single spot are correlated and not independent, 

* y - ( £fx 2 ) /N is a linear function of the f s. 
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we have a ready explanation in that our items 
are not independent random samplings. Prior to 
the study we would, in fact, have cogent grounds 
for expecting the distribution to be non-normal,, 
even though we do not know just what form to ex- 
pect. 

section 10. THE DISTRIBUTION OF THE SUM OF 
NORMAL VAR I ABLES 


Let x 1 , x 2 each be deviations from means , 
normally distributed, and independent of each 
other, with variances V 1 and V 2 , and let x=x 1 4-x 2 . 
We shall determine the moments of the distribu- 
tion of x. 


k - 


+ k 




k(k-l ) 
1-2 


■ x, 


2 v 2 
X 2 


In getting the k * th moment of x we observe that 
all summations involving x x or x 2 to an odd pow- 
er vanish because the variables are independent 
and the summation of odd powers of each sepa- 
rately is zero ; and summations involving even 
powers, £ and h, can be expressed in parts, thus 

lx\x h 2 = N t 2 


Finally, utilizing [8:12] in connection with the 
moment of and x 2 , we obtain after making the 
necessary summations and substitutions, in case 
k is even, the following k ’ th moment of x: 

^ = (ft- 1) (k- 3). . . (1)(V’ 1 + V 2 ) kl 2 
Of course 

fi k . 2 = ( k ~3). . . +F 2 ) lk-2,/2 
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Since p 2 = V± + V 2 we obtain 
fl k - ( k - 1 ) M|< . 2 

This is [8:12], the fundamental relationship be- 
tween the even moments in a normal distribution. 
We have thus proven that x, the sum of two nor- 
mally distributed independent variables, is nor- 
mally distributed. It is noteworthy that this 
maintains even though the variables have unequal 
variances. 

If x is the sum of three normally distributed 
independent variables, then x is normally dis- 
tributed. The proof is immediate, for we can 
first combine x 1 and x 2 , getting a normally dis- 
tributed variable, and this combined with x ^ 
yields a normally distributed variable. 

Finally, the sum of any number of independent 
normally distributed variables is normally dis- 
tributed. 

If we let x. , = -Xj and substitute -x. , for 
x. in the sum, the preceding statement obviously 
holds, The proof that adding a constant, or 
multiplying a variable by a constant, does not 
alter the relationship is equally simple, so we 
can assert: Any linear combination of normally 
distributed independent variables is normally 
distributed. 


SECTION 11 . THE RELATIONSHIP OF CHI-SQUARE, 
T-SQUARE, AND F TO THE NORMAL DISTRIBUTION 


The purpose of this section is to state and 
illustrate, but not to prove these relationships. 

Let A be a magnitude of the small order of 
the di f ferentials of calculus. Let 





A 


i 
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This equals the probability of a measure falling 
in the interval x 1 to x 1 + A x . Similarly for a 
second variable 




If x ± and x 2 are independent, the probability of 
the joint occurrence, or of obtaining a case 
falling within the A x interval for the first var- 
iable and within the A 2 interval for the second 
variable is 


p A/A 2 


- [■ 


(27 7) 


2 2 

-1 X 1 X 2 

exp— (— + —)] A x A 2 


The distribution in two-dimensional space of the 
function within the brackets is related to a X 
distribution with two degrees of freedom, and in 
general the i dimensional distribution of 


<T l Cr 2 *** °1 ( 2?7 ) 


-1 X 1 X 2 
exp — ( — + — + 
k 2 V 1 V 2 



is related to a \ 2 distribution with i degrees 
of freedom. The use of VI instead of V ] is to 
avoid ambiguity with the more usual meaning of 
V. as given in [8:38]. 

On the two-dimensional surface having dimen- 
sions x 1 /a 1 and x 2 /a 2 , the distance from the or- 
igin of any point is i/(x 1 /o- 1 ) 2 + ( x 2 /a 2 ) 2 , a one- 
dimensional variable, which we designate chi. 
Of course it > 0. 
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Since x x and x 2 are uncorrel ated the mean o£ 
this X 2 i if m a n y samples are taken is 


Mean = V(~) + V(— ) 

(T, Cr , 


~ 1 “ 2 ^ “the number of d.o.f. 


For the general case we have 

Mean ^2 = i Mean of with i d.O.f [8:36] 

Since the mean X\/* ~ 1 (the median < because of 
skewness of the distribution), any set of x 19 x 2f 
. . . values yielding a ^/i>l (more precisely 
greater than the median) tends to indicate vari- 
ability factors in X] excess of those postu- 
lated* We have 




Chi-square 
degrees of 
freedom 


with i 

■ ■ [8:37] 


From the structure shown in [8:37], ^.f/i is 
clearly a variance, as is also ^ itself. To 
simplify future notation we write 


V, = X* ••••••• [8:38] 


This V. is the quantity entering into a vari- 
ance ratio 


_ V* 
F,J " v./j 


A variance ratio based upo,n 
1 (numerator) and j (denom- 
inator) d.o.f. ..... 


See[9:21] 


As a limiting case a X f /i is a variance ratio in 
which the d.o.f. of the denominator variance is 
®, for Fg/co = 1. 
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Accordingly 



Chi-square expres- 
sed as a variance 
rati o 


[8:39] 


The variance ratio distribution is that of a 
quotient. The distribution of the numerator 
alone of F, . is that of V f /i, and of the denomi- 
nator is that of V./j . Consider a scatter dia- 
gram Chart VIII V, with dimensions V./i and V./j. 
A fixed value of F is represented by a straight 
line passing through the point (0,0). F + A f is 
another straight line slightly to the right of 
the former and the frequency of cases between 
the two lines is A . the probability of a vari- 
ance ratio between F and F + A f . The entire fre- 
quency to the right of the line F is P, the prob- 
ability of a variance ratio greater than F. Thus, 
though several steps removed, F is a derivative 
of the normal distribution. 

CHART VIII V 
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The t? (student’s "t" squared) distribution is 
an / x . distribution, the distribution of a vari - 
ance ratio having one degree of freedom in the 
numerator and j degrees in the denominator. 

The x 2 (x of a unit normal distribution 
squared) distribution is the distribution of an 
F t a variance ratio having one degree of 
freedom in the numerator and oo degrees of freedom 


in the denominator. 

The F distribution spans the field. However, 
since the determination of F from F would in- 
volve a three-way table, the dimensions being F, 
i, and j , its complete tabulation would be a 
collossal undertaking. This, in a modified form 
has been done in the 500-odd pages of Pearson s 
Tables of the Incomplete Beta-Functions ,( 1934 ;. 
Snedecor (1938), Fisher and Yates (1938), and 
others have tabled F for selected values of P and 

extended values of i and j . • r p 

In Chapter IX we give a transformation o f * 

voiding the need of any of these ^[“'table to 
use instead of a normal probability table to 

obtain P for* any P • 


CHAPTER IX 


THE STATISTICS OF ATTRIBUTES 

section l. SITUATIONS WHEREIN QUALITATIVE SERIES ARISE 

Data are placed in categories when a quanti- 
tative relationship between the classes does not 
exist (more accurately "is not known to exist"), 
when the quantitative relationship is only vaguely 
surmisable, or when the known quantitative re- 
lationship between classes is neglected because 
the more primitive and simple qualitative methods 
would seem to suffice. Customary types of quali- 
tative series would be such as a number of men 
classified by nationality, or by eye color, or 
by vocation, or by religion, or by marital status, 
or by language spoken, etc. Handling such situ- 
ations involves the statistics of attributes, or 
of qualitative series, or of categories. The 
fundamental item is the frequency in a class . 

section 2 . THE FREQUENCY IN A CLASS 

Let the classes into which the data can fal] 
be a, b, c, ... k and the observed frequencies 
be / a , / b , . . * f k , and let f g stand for any 

one of these frequencies as s takes values from 


3il 
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a to k inclusive. For many purposes it is desir- 
able to distinguish between those situations in 
which the probability of a case falling in a class 
is known a priori and those in which it can only 
be surmised within limits from the observed pro- 
portions in the classes. If people are classi- 
fied into six vocational groups and the observed 
frequencies, when 100 are drawn at random, are 
18, 16, 22, 13, 14, 17, we can estimate the prob- 
ability of a measure falling in a given class. 
We would estimate the chance of a person being 
in the first class as .18 and can use this esti- 
mate, with precautions to be noted, in judging 
whether a subsequent sample is one drawn from the 
same population. This is the most usual quali- 
tative series. 

Another, not differing in face appearance, but 
actually differing exists when one knows , because 
of antecedent knowledge, the probabilities of a 
measure falling in the different classes. Such 
a series is a stochastic series. If a homogeneous 
cube with red, yellow, blue, violet, orange, and 
green faces, is tossed 100 times the frequency 
of occurrence of the various faces up might be 

Color red yellow blue violet orange green 
f 8 16 22 13 U 17 

We can make tests of this series which are not 
possible with the other for the a priori prob- 
abilities are 1/6 for each class. 

Had the cube been a die with 1, 2, 3, 4, 5 
and 6 pips upon its faces, the frequencies of oc- 
currence would have the same fundamental proper- 
ties as before, though the X score is now a 
quantitative one. For some purposes the number 
of pips should be treated merely as categories, 
but for other purposes, as for example when paying 
gambling debts, they must be treated as quanti- 
tative measures. In the series involving the 
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cube with colored faces some genius might be able 
to see a quantitative scale and use the results 
quantitatively, but the ordinary non-physicist 
could only treat it as a categorical series. 
This observation is made because it is true that 
series which seem to be merely qualitative may 
frequently be found, with further study, to be 
quantitative in a respect that has scientific 
and social consequences. A good example of this 
in psychology is in the quantitative measures 
that have been teased out of qualitative expres- 
sions of interests and attitudes. The enrich- 
ment of understanding is so great when qualita- 
tive data can be recast into a quantitative mold 
that the idea of doing so should be the initial 
thought of a student when first confronted with 
a qualitative series. 

We will first derive a few statistics per- 
taining to a stochastic variable. Let us know a 
priori that the probability of a measure drawn 
at random in class A is p and that the probabil- 
ity of its not so falling is q, (p+q=l). Let X 
- the number found in class A when a sample of N 
is drawn. If N = 1 then X = 0 or 1. The theo- 
retical, or true, distribution would be the av- 
erage of many such samples, and would have mean 
and moments as in Table IX A herewith. 

TABLE IX A 


COMPUTATION OF MOMENTS FOR A TWO-POINT DISTRIBUTION 


X 

PROBA- 

BILITY 

f - PROB. NO. 
WHEN T SAMPLES 
OF I EACH 

ARE DRAWN 

fx 

fx 2 

fx 3 

fx* 

0 

Q 

Tq 

0 

0 

0 

0 

1 

P 

Tp 

Tp 

Tp 

Tp 

Tp 

SUMS 

. 1 

T 

Tp 

Tp 

Tp 

Tp 

MEANS 

. . . 


P 

P 

P 

P 
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M (Mean of fs) = p 
V = p~p 2 = pq, by [6:05] 

^ 3 - P~3p 2 + 2 p 3 = pq (q~p), by [6:47] 

/x 4 s p-4p 2 + 6 p 3 - 3p 4 r pq( l~3p+3p 2 ), by [6:48] 

In a similar manner we can compute the moments 
of the X distribution if T (a large number) sam- 
ples of 2 each are drawn. 


The 

chance 

that the first case 

zz 

0 

is 

q 

The 

chance 

that the second case 

“ 

0 

is 

q 

The 

chance 

that both cases 

zz 

0 

is 

q 2 

The 

chance 

that the first - 0 







and the second 

zz 

1 

is 

pq 

The 

chance 

that the second = 0 







and the first 

zz 

l 

is 

qp 

The 

chance 

that the one - 0 







and the other 

zz 

1 

is 

2 pq 

The 

chance 

that the first case 

z: 

1 

is 

P 

The 

chance 

that the second case 

zz 

1 

is 

P 

The 

chance 

that both cases 

zz 

1 

is 

P 2 


The sum of these chances (p 2 + 2pq + q 2 ) = (p+q) 2 ~l t 
as of course must be the case. When samples of N 
are drawn the successive probabilities are the 
successive terms of ( p+q ) H • We could compute mo- 
ments, as in Table IX A, having X = 0, 1 and 2, 
and probabilities q 2 , 2pq and p , butwill proceed 
immediately to the general case of the distribu- 
tion of the frequencies in a class, for which the 
probability of a measure drawn at random being in 
the class is p, when T (a large number) samples 
of N each are drawn. The (fX^) in column 3 is 
the entry under fX in column 2 for X = i, for ex- 
ample (fX 1 )= TNp [q M_1 ]. 



f - PROB. NO. 
WHEN T samples 



TABLE IX B 

COMPUTATION OF THE MOMENTS OF THE FREQUENCY IN A CLASS 








316 STATISTICS OP ATTRIBUTES [9:01] 
Mean of the frequencies in the class = Np [9:01] 


Variance = ZVp( 1+Ap-p ) - (Ap ) 2 


= Ap(l-p) = Npq (see [14:21] ) [9:02] 


By similar calculation with fX- 1 and /X 4 , or more 
simply by utilizing [14:18], obtain 

= Npq(q-p) .... (see [14:22]) [9:03] 

p, = Npq [l+3(N~2)pq] (see [14:23]) [9:04] 



rg"p ; 2 

Npq 


Skewness (Pearson) of 
frequency in a class 


2 

„ l-6pq 

= — = 3 + — 

V 2 Npq 


Kurtosis (Pearson) of 
frequency in a class 


[9:05] 

[9:06] 


The higher moments can be gotten by Romanovsky' s 
reduction formula [14:18], Thus the complete 
distribution of frequencies in a class, due to 
random sampling, is available. 

Let us examine this distribution for different 
values of N and p. 

If p is very small and N large then the mo- 
ments, V f and approach the value M , and the 
higher moments are given by [14:34], [14:35], 
[14: 36] , etc. This is a Poisson distribution. 
The particular fact that M = V underlies the com- 
putation of X 2 based upon frequencies in classes. 
If N is large and p not very small then /? 1 ' V G, 

which is normal skewness , and /3 2 ~3 , which is 

mesokurtosis, and all higher moments approximate 
the normal distribution values. The approach to 
the normal distribution is most rapid if p = q. 
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For all values of N we note that = 3 when 

1 _ 

(1-6 pq) = 0, i . e. , when p - • 5 + “ /3^- . 211325, 

6 

or .788675. The distribution is platykurtic for 
values of p between these two values. Otherwise 
it is leptokurtic, but the deviation from meso- 
kurtosis is slight if N is even of moderate size. 

We can use the binomial distribution to get a 
precise test of the significance of a difference 
between an observed and a theoretical class fre- 
quency. With no other information to help in a 
solution let us be given the fact that Joe matches 
a coin with John and wins eight times in ten. 
What is the likelihood that, if the coins are 
unbiased and the throws fair, so one-sided an 
outcome would arise as a matter of chance? The 
assumption is that p - q - .5, so the distribu- 
tion of probabilities is that given by successive 
terms of (.5 + . 5) 10 , which are as follows: 


TABLE IX C 

CERTAIN BINOMIAL PROBABILITIES 


Joe wins = X 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 
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— 





rH 



Thus the chance probability that Joe would win 
eight times in ten is .0439453125. The probabil- 
ity that he would have eight or more successes 
is .0546875 ( = .0009765625 + .0097656250 + 
. 0439453125) . The probability that an outcome 
as extreme as this would arise as a matter of 
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chance is double this, or .109375. This last is 
the figure that ordinarily concerns one. It is 
the P of goodness -of- fit tests and states the 
chance that a situation as extreme as the one 
observed would arise as a matter of chance. 

The usual way of obtaining an approximate P is 
by means of \ 2 , the square (or squared) contin- 
gency. We will apply it to the present problem 
and compare the P therefrom with the correct an- 
swer, .109375. 

section 3 . CHI-SQUARE 

Let f g ~ the observed frequency in the cell, 


or class, s [9: 07] 

Let / s = the frequency in this cell according 
to the theory being tested. [9:08] 

'V 

f s ~ f s - the cell divergence [9:09] 


( f s ~ f a ) 2 ~ the cell square divergence. 


(f a ~f s f _ >2 _ , 

A, = tbe cell square contingency [9 : 10] 

f 

;* , (f s - f s ) z 

> ~ S X 8 ~ S[ ~ ” the sum of such 

f s 

for all cells - the square contingency [9 : 11] 


The number of degrees of freedom in ^ 2 , as fully 
explained in Section 4, is here designated d.o.f. 
F I j is a variance ratio with i degrees of freedom 
in the numerator and j in the denominator. 
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* 2 = 


a variance ratio with d.o.f. degrees 
of freedom In the numerator and In- 
finity degrees of freedom In the de~ 
noml nato r 


P from F is the chance that a situation deviating 
from the theoretical one as much as does the ob- 
served one would arise as a matter of chance. For 
the coin matching problem, P from % 2 will b e only 
an approximate answer because the \ 2 / d. o. f . and 
the F distributions derive from the assumption 
that the elementary variable is normally distri- 
buted and reference to [9:05] and [9:06]show this 
not to be strictly true for frequencies in a class. 
The cell square contingency is an approximation 
to a critical ratio squared for, if p is small, 
f = pq - V. Wilson, Hilferty, and Maher* 
(1931) observe for a more extreme example than 
ordinarily occurs that it makes no particular 
difference whether Np, or Npq, is employed in the 
denominator when calculating the cell square con- 
tingencies. We may therefore think of the cell 
square contingency as very similar to a squared 
critical ratio. If these separately are normally 
distributed the sum of a number of such is pre- 
cisely r , as tabled. For the coin matching 
problem we have: 


Humber of 

Number of 

Successes 

Fai lures 

8 

2 

5 

5 

3 

- 3 

1.8 

1.8 


* Wilson, Hilferty, and Maher note that with k celts 
d.o.f. ffl(k-l) and %' 2 /k, in' which ^ ; 2 '=s C ^- — ■ ] , 

N p q 

similarly distributed. J ° 


and (k~l) 
are very 
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The four entries in each cell are, in order, 
f s = the observed cell frequency 


f s - the theoretical cell frequency 


( f s ~ f s ) the cell divergence 


- = XI ~ the cell square contingency 


r = Sll = 3.6 

The number of degrees of freedom is 1 for, with 
10 matchings and 8 successes, the number of fail- 
ures, 2, is dictated. ^ 2 /d.o.f. = 3.6/1 and from 
tables (either a \ 2 with 1 degree of freedom; a 
F 1#00 withl degree of freedom in the numerator 
and oo in the denominator; a t 2 with oo degrees of 
freedom in the denominator; or ax 2 of a normal 
distribution) we find P ~ .05775. This result 
is not close to the correct answer, but Yates 
(1938) has pointed out the need for a correction 
in X 2 » in case of one degree of freedom . The 
correction is accomplished by "reducing by .5 the 
values which are greater than expectation and 
increasing by .5 those which are less than ex- 
pectation . " 

The intent of Yates’ correction is to adjust 
for the fact that observed frequencies must be 
integers , while the theoretical distribution 
( the X 2 distribution) is continuous. If, in a 
continuous distribution, a value such as 7.51 
must be assigned an integral value it is called 
8.0, which is to say that in fact the value 8.0 
covers all values from 7.5 to 8.5. 

If we therefore consider our observed number 
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of successes to be 7.5, and of failures 2. 5 and 
recompute , we obtain \ 2 - 2.5 and P = • 11385 
which is in fair agreement with the precise value 

(.109375). 

SECTION 4. X 2 FROM the general contingency table 

Let us now consider the problem wherein a pri- 
ori probabilities for the different cells are un- 
known. If 100 people, drawn at random in city a, 
200 drawn at random in city /3, and 300 drawn 
at random in city y fall into vocational groups 
as given in Table IX D, we can ask and answer the 
question "Are the discrepancies in the relative 
frequencies for these three series such as might 
readily arise as a matter of sampling if all 
three drawings are in fact from the same parent 
population, or do they exceed this so that we 
must believe that they come from different parent 
populations, i . e. , that communities a, ft and y 
are of different sorts?" 

We must compare these observed cell frequen- 
cies with theoretical cell frequencies, but we 
have no a priori values. However, on the assump- 
tion that the three communities are the same, we 
can assign theoretical values which are in agree- 
ment with the marginal totals \ and N ± . To il- 
lustrate for cell yA: Half the cases are 7 cases 
and 1 / 6 of them are A cases, so clearly 1/2 
times 1/ 6 of them, or 1/12 of them, [( 1 / 2 ) ( 1 / 6 ) 
600 = 50] , must, in the population, be yA cases. 

''Vy 

This, accordingly, is f yK , the theoretical yA 

cell frequency. In general,— X — X N is the 

theoretical cell frequency for the cell at the 
intersection of row s and column t , whi ch we 
designate cell st. 

The 600 people whose status is recorded in 
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Table IX D constitute a sample. They are, with 
certain limitations, to be thought of as randomly 
chosen from an infinite population. If there 
were no limitations , a second sample might not 
have 600 cases, or 100 from City a, 100 in Voca- 
tion A, etc. , but we are not interested in the 
facts that there were 600 cases, 100 from City a, 
100 in Vocation A , etc. That there were 100 from 
City a was probably dictated by the method of 
sampling and that there are 100 in Vocation A f 
though perhaps not dictated by the method of col- 
lecting the data is nevertheless not the issue 
that concerns us, which is the relative frequen- 
cies for the different cities of those in a cer- 
tain vocation. We shall therefore look upon this 
particular sample as one of an infinite number 
of possible samples for each of which the same 
marginal totals , 100 , 200 , 300, 100 , 90, 140 , 75, 
85, 110 , maintain. The theory that we shal 1 
test,~are the relative frequencies in the voca- 
tions from city to city such as might be expected 
due to sampling, is a posteriori to the knowl- 
edge of the marginal totals, which is to say that 
these marginal totals are an intrinsic part of 
the hypothesis and not issues being tested. Thus 
the hypothesis being tested is subj ect to the 
following restrictions upon the class frequen- 
cies: * 
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^(XA + ^/3A + fyA 


= N, 


f as + ^/Sb + ^yB 


= AL 


f ac + f /3c + f yc 


= AL 


f ao + f /3o + f yo ~ N v 

f ae + % + V = *e 


[ 9 : 14 ] 


There are eight restrictions upon the frequencies 
in the classes. These restrictions are of three 
sorts, one is connected with N, the size of the 
particular sample, the number of the second sort 
is one less than the number of rows, and the num- 
ber of the third sort is one less than the number 
of columns. None of these restrictions can be 
derived from the others, so there are just eight 
independent linear restrictions. Such a restric- 
as 


/ +/ -f / + / 4* f + f — j/y 

yA yB yC yD yE yF y 


is not included for it is derivable from the 
first three given. 

Having thus fixed the marginal totals we know 
the exact theoretical cell frequencies, under the 
hypothesis. For any cell, such as that at the 
intersection of the s row and the t column, this 
theoretical cell frequency is 


f X ” hi X A! 


S t 


IV X N X N - p s p t N 


Theo ret leal cell 
frequency under 
hypothesis of 
Independence 


[ 9 : 15 ] 


in which p s is the proportion in row s and p t 
the proportion in column t and N the size of the 
sample. 

An elmentary theorem of probability is that the 
probability of the joint occurrence of any num- 
ber of independent events is the product of the 
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probabilities of these events separately . The 
probability of being in row s is p s , in column t 
it is p t so the probability of a case falling in 
cell st is p s p t> and the "expected" number in 
this cell is p s p t A , and is frequently designated 
E st . We shall employ ? st for, of course, we 
should not actually expect that for any cell a 
sample will have the exact relative frequency of 
the population. 

To reinterpret: The statement in the first 
sentence of this section, that a priori class 
frequencies are unknown is false when the hypoth- 
esis being tested is so cast as to be contingent 
upon the observed marginal totals. Under a hy- 
pothesis so conditioned we do exactly know the a 
priori, or theoretical cell frequencies. 

Clearly, by-and-large, the observed class fre- 
quencies cannot vary as much from the theoretical 
class frequencies if there are linear restric- 
tions placed upon the class frequencies as if 
there are no such restrictions. It has been es- 


TABLE IX D 

VOCATIONAL FREQUENCIES FOR DIFFERENT COMMUNITIES 
VOCATIONS 
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45.00 

70.00 
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tablished by Fisher (1922) that a total from 
a table having (i + g) cells and g independent 
linear restrictions upon the cell frequencies 
has exactly the same form of distribution as a 
from a table having i cells and no restric- 
' tions upon the cell frequencies . It is not the 

number of cells in the table, but i, the number 
| of degrees of freedom, which is crucial in de- 

termining the X 2 distribution. 

X 2 = S X 2 t = 10- 134 

d. o. f . = 10 

v 

— — - = 1.0134 
d. o. f . 

The four entries in a cell are, in order 



f st - the observed cell frequency 

[9:16] 


“ the theoretical cell frequency 

[9:17] 

1 

1 ■ 

1 
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i 

= ^e ce H divergence 

[9:18] 
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1. 
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S £ __ £ \ 2 

' st s __ 0 _ the cell square 

f' contingency 

St 

[9:19] 

| 

For the table entire we have, 

y2 - e >2 Chi-square, the 

/t “ o /Ls t • * • square .contingency 

[9:20] 


section 5 . TRANSFORMATIONS NORMALIZING 
VARIANCE RATIO DISTRIBUTION 

THE 

) 

i- ■ 

V x (= Xa as given in [8:38]) is a variance of 
the sum of i independent standard scores, and F. 
another similar variance, independent of the 
former, having j degrees of freedom. 


'• . 

"J 
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F 


' J* 


V£ 

V' 


A variance ratio having 1 


d . o . f . in 
j d o o. f . 


the numerator and 
n the denominator 


[9:21] 


If the hypothesis is that V./i V ■/ j , the di- 
vergence of F. • from 1.00 (more precisely from 
its median value) is a measure of improbability 
of the hypothesis. We obtain a probability 
measure of any such divergence as follows: 


Let /, j 
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Let 6 , 


JLjCL 
J2~ 3 1 


and 6. 
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Table IX E, which is an abridgement of the table 
of 6 j values given in The Kelley Statistical 
Tables, revision of 1947, facilitates the use of 
[9:24]. 

TABLE IX E 


TABLE OF 6 VALUES 
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1.6499 

17 

2.0936 

33 

2.1070 

49 
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2 

1.8856 

18 

2.0951 

34 

2.1075 

50 
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3 

1.9642 

19 

2.0965 

35 

2.1079 

60 

2.1135 


2.0035 

20 

2.0978 

36 

2.1082 

70 
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5 

2.0270 

21 

2.0989 

37 

2.1086 

80 

2.1154 

6 

2.0428 

22 

2.0999 

38 

2.1089 

90 

2.1161 

7 

2.0540 

23 

2. 1008 

39 

2. 1092 

100 

2.1166 

8 

2.0624 

24 

2. 1017 

40 

2.1095 

200 

2.1 190 

9 

2.0689 

25 

2.1025 

41 

2. 1098 

300 

2.1197 

10 

2.0742 

28 

2.1032 

42 

2. 1101 

400 

2. 1201 

1 1 

2.0785 

27 

2.1039 

43 

2.1104 

500 

2. 1204 

12 

2.0820 

28 

2.1045 

44 

2.1 106 

SOO 

2. 1205 

13 

2.0851 

29 

2.1051 

45 

2.1108 
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2. 1206 

14 

2.0876 

30 

2. 1056 

46 

2. 1 1 1 1 

800 

2. 1207 

15 

2.0899 

3! 

2.1061 

47 

2.1 113 

900 

2. 1208 

16 

2.0919 

32 

2.1066 

48 

2.1 1 15 

1000 

00 

2. 1208 
2.121320 


E 1 ' < . 0002 
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a 


-e, + e.f ,j 


! i j 'j 


The d no rma I I z I ng 
transformat Ion 


[9:24] 


Treating d as a deviate in a unit normal distri- 
bution will yield q, the proportion to the right 
of the point d. This is the desired P for . , 
the probability that a variance ratio as great 
as that observed would arise as a matter of chance 

'Vr /V/ 

under the hypothesis that V } /i - V./j. This ap- 
proximation is close [see Kelley, Tables (1947)] 
if j > 3, and if ; < 3 the xd transformation 
herewith is close. 


x 




The xd no rma i I zl ng 
transformation (use 
when j < 31 


[9:25] 


A P from d [9:24], or from x [9:25], having a 
value in the neighborhood of .50 is most confir- 
matory of the hypothesis. A small value of P 
tends to disprove it by asserting that a situa- 
tion as extreme as the observed situation is 
very unlikely to arise as a matter of chance and 
a large value of P tends to disprove it by as- 
serting that as close a fit (agreement between 
Vj/i and Vj / j) as the observed fit is very un- 
likely to arise as a matter of chance. We elab- 
orate upon this point in the next paragraphs for 
it has frequently been misunderstood. 

We may write V, = V. i non . ehanoe ) + V, (chance j , 
that is, the observed variance differs from the 
underlying true variance by an amount, V !(ohance) , 
— commonly called the error variance, — whicn is 
due to sampling. Also V. = V. , n 0 n . c h a n c e , + 
V j (chance)' though in most cases the hypothesis 
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is such that V ] (non _ chance , 


V . 

j (chance) 


and the F 


» J 


= . 0 . 

test is 


Then V. = 
to see if 


V, , . , exists. Expressed in terms of 

the ci ty-v.ocation distribution problem the test 
is to see if the relative differences in frequen- 
cies of vocations from city to city are such 
that chance is insufficient .to account for them. 
Thus 


Fjj becomes - 


r v + v 1/ i 

(non-chance) r itchance) J/ 

^ ] (c hance) ^ 1 


Since V i (chance/- 1 ' oni y differs from V. (chance) /j 
as a matter of chance the variance ratio hovers 
around the value 1.0 when V, t non _ chance) = 0. 

If F Unon-chanceJ is substantial and 


V V 

I (chance) __ j (chance) 

i j 

then F t j . is much greater than 1.0 and P is small. 
This situation is easy to interpret. We simply 
believe the hypothesis disproved because of the 
presence in V. of non- chance factors. 

If Fj j is much less than 1.0 we not only must 
believe that -V ] ( n on~chance) is nonexistent or 

negligible, but also that V. (chance} is unrea- 
sonably small, again disproving the hypothesis. 
This situation is the more difficult to account 
for. It can be brought about (as certain poorly 
devised studies bear witness to) if an experi- 
mental procedure has employed such controls as 
to eliminate to a degree the operation of the 
same chance factors in V. as operate in V. . 

In short , ifF } . is large and P small consider 
the hypothesis untenable because of variability 
factors in V ^ in excess of the hypothesis , and 
if F | . is small and P large consider it untenable 
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because of faulty controls in the experimental 
procedure. This statement of the case holds 
when the denominator variance, V . , is the chance 
variance. The writer recommends that F, . be so 
written that this is always the case no matter 
if this result in an F. . which is less than 1.0. 
In order to use certain tables it has been cus- 
tomary so to write F that it is greater than 1.0. 
It is only necessary to note, if P- • isP from 
P, j and Pj . is P from, F. , , 

<'"’k 

that 


P 


» j 



[9:26] 


to see that no matter how it is written to con- 
form to tables it can always be written with the 
error variance in the denominator for purposes 
of interpretation. 

Since \ 2 / d.o.f. from Table IX D is equal to 
F./i in whi ch i = 10, we have 


10.134 

Fl0 .“ ~ 10 . 


1.0134 


Since F 10 ^ so nearly equals 1.0 we can, without 
computing P, be convinced that cell-divergences 
from theoretical values as great as the observed 
divergences could readily arise as a matter of 
chance. The determination of P merely confirms 
this. We have 


10 ,00 

= 1.0134 

6 ’ 

00 

= 2.121320 

*10 , oo 

= 1.00445 

vTT 

= .316228 



d 

= . 17885 

10 

= 2.0742 

P 

= . 4290* 
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Thus there are 43 chances in 100 that if (V^/i) 
- 1,0 a sample value as large as 1.0134 would 
occur. 

We find the data highly consistent with the 
hypothesis tested, but we should not say that 
the hypothesis has been "proven”, for the data 
may be consistent with any number of other hy- 
potheses. Though a high or a low P can he taken 
as " disproof n of the hypothesis , one in the 
neighborhood of .5 is only indicative of consis- 
tency with the hypothesis , not of its proof . We 
never prove a hypothesis in the sense that it 
and none other is tenable, but only at times 
have evidence which does not disprove it. 

The accuracy of P from the d, or the xd- trans- 
formation extends to the third or fourth decimal 
places and is sufficient for all experimental 
purposes. A more serious hazard is consequent 

to the assumption that ( is normally dis- 
tributed. The actual divergence from normality 
is due to the fact that f st takes integral values 
only, and further, if p st and W are both small 

'■V 

the distribution of ( / #t - f &t ) approaches a Poisson 
and not a normal distribution. To lessen this 
hazard Kelley (1923) recommended that (p s± N) be 
> 1; Fisher and Yates (1938) recommend such a 
grouping that (p st N ) he not < 5.0; and Cochran 
(1942) provides data upon the requisite minimal 
size of <p # t N) at the one and the five per cent 
levels of significance for different degrees of 
freedom. He writes, "With only a single small 
expectation [i . e . , only one cell with small 
(p st N)] the conventional limit of five for the 
smallest expectation appears unduly high. At the 
5 per cent level, the tabular X 2 distribution may 
be used without undue error with an expectation 
as low as 0. 5, and at the one per cent level with 
an expectation as low as 2. " With one degree of 
freedom Yates* correction is to be applied, though 
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Cochran points our certain infrequent exceptions 
to this. A correction of the Yates type is in 
order when the number of degrees of freedom ex- 
ceeds one, but Cochran has shown that its value 
is small and its precise computation rather com- 
plicated. 

A fundamental property of /[ 2 , s is given by the 
following equation, in which the subscripts in- 
dicate the respective degrees of freedom: 




• = ZUj. 


. +t 


Additive 

property 

of 


[ 9 : 27 ] 


p from ^? + . +i +t is the probability that chance 
would yielcl divergences -squared in all the sit- 
uations whose sum total is as great as that ob- 
served. This frequently enables the combination 
of a number of experiments so as to obtain a 
single measure of confidence . However, if the 
sign of a divergence is material, the method 
does not answer the quest ion of importance . What 
is then required is a method based upon devia- 
tions with their sundry plus and minus signs and 
not one based upon deviations squared. 

X 2 has been extensively used in such contin- 
gency situations as the Ci ty-Vocation problem 
here given as an illustration, but theoretically 
it is more exactly applicable to quantitative 
problems wherein the variable is both continuous 
and more nearly normal than is the frequency in 
a class variable. Generally, where quantitative 
data are involved, more informative regression, 
analysis of variance, and variance ratio (F ? . , 
not merely F^qq which - X* / d.o.f.) methods are a- 
vailable, as illustrated in Chapter X. The stu- 
dent should not believe that the X 2 method is in 
any way limited to contingency data. 


CHAPTER X 

ESTIMATION, REGRESSION, AND CORRELATION 

section l. A BRIEF PERSPECTIVE OF THE FIELD 

It has been claimed that every properly con- 
ducted experiment should hinge upon some null hy- 
pothesis. Most correlation problems are not so 
stated. If a new psychological test is being de- ' 
veloped for use in connection with college admis- 
sion, the initial null hypothesis that the test 
correlates with college success to the extent zero 
is not made. If the correlation found is .70 (st. 
er. = .03) something very useful is now known that 
was not known before, but there has been no null 
hypothesis. We shall here be concerned with cor- 
relation and regression methods without regard to 
null hypotheses, but will later note how nicely 
they fit in with certain null hypotheses and how 
gracefully they contribute to the analysis of 
variance. 

The correlation coefficient, r 01 , is a measure 
of mutual or reciprocal relationship and is fre- 
quently informative of itself, though its primary 
utility is simple as an essential measure entering 
into an estimation equation. 
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*0 a 0 1 + b 0 1^1 


Linear regression equa- 
tion of upon 


[ 10 : 01 ] 


*01 


M n 


0 1” i 


[ 10 : 02 ] 



Thus 


Regression coefficient 
of X 0 upon X ^ 


[10:03] 


*o = 


r oi ^ M 1 + r oi^ 
cr. 


[10:01a] 


When deviation scores are employed [10:01] be- 
comes 


*0 - 6 0 1 X 1 = r oi— X 1 [10:04] 

o-i - 

and by parity 

= b ia x o = r oi — x o [10:04a] 

^0 

When standard scores (see [8:23]) are employed 
[10:01] becomes 


z o r o i z i [10 : 0 5] 

and by parity 

z i “ r o i z o [10:05a] 

revealing the reciprocal nature of the relationship. 

There are many corn lation formulas because 
of the variety of types of data and because of 
the need for various corrections in the raw re- 
sults* We will designate the four most important 
corrections by preceding subscripts: a for at- 
tenuation, c for coarseness of grouping, f for 
fineness of grouping, and s for shrinkage. Which 
of these corrections applies varies from situa- 
tion to situation because of the di f ferences in 
types of data and of uses to which results may 
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be put. The corrections listed in Table X A have 
been developed in connection with quantitative 
data, and the fineness of grouping correction is 
of prime importance in connection with qualita- 
tive data. 

TABLE X A 

NOTATION OF SUNDRY CORRECT IONS TO CORRELATION COEFFICIENTS 

r Q1 , raw r between quantitative variables X Q 
and X ± 

a a r oi» a]so designated is r Q1 corrected for 

0 1 attenuating effect of errors of measure- 
ment of I Q and X ± 


a r Q1 , also designated r^, is r Q1 corrected for 
0 attenuating effect of errors of measure- 
ment of X Q 


a r Q1 , also designated r Q ^ is r Q1 corrected for 
attenuating effect of errors of measure- 
ment of X ± 


c r 01 c 0 c 1 Jr 01 > 


r Q1 corrected for equally spaced coarse 
grouping of X Q and X ± 


c -r 0 i' r oi corrected for equally spaced coarse 
grouping of X Q 

c r oi* r oi corrected for equally spaced coarse 
1 grouping of X x 


c ,r o iT c 1 c ,r oi* r oi corrected for unequal 1 y spaced coarse 
grouping of X Q and X ± 

2 2 

s r oi' r oi corrected for shrinkage 

r, or r 2 , with multiple preceding subscripts 
are combinations of the preceding 
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In case regression is nonlinear one of the 
variables, usually X 1$ may be transformed into a 
new variable, X', producing substantial linearity 
and the correlation r Q ^ f computed. This trans- 
formation may frequently be accomplished by a 
higher order parabolic regression. If of the 
second degree, the situation is equivalent to a 
multiple variable problem, involving three vari- 
ables, X Q , X ±9 X\ f the multiple correlation no- 
tation for which is r 0A1 r 2 

~~ __ 9 Second degree para- 

Xq ~ a + b Xj + C X ± bollc regression of [10 : 06] 

X 0 on X ± 

The double bar is used to distinguish this esti- 
mated X Q from that given by [10:01]. The corre- 
lation between X Q and the right-hand member of 
[10:06] is r QA1 x 2 and it may call for a, c , and 
s corrections. 

This variety of corrections would be very 
troublesome were it not ordinarily possible to 
work with such original data as not to demand 
them. If yV, the number of cases in the sample, 
is large the shrinkage correction is negligible. 
The grouping of measures into classes is usually 
upon the basis of equally spaced intervals so the 
c ' correction is not involved. Usually 12 or 
more equally spaced classes can be employed, thus 
making the c correction unimportant. The "cor- 
rection for attenuation" adjustment yields an 
estimate of what the correlation would be were 
errors of measurement lacking. It thus answers 
a theoretical question and is not appropriate 
when the problem is to estimate an actual X Q (not 
a hypothetical one without error) from an actual 
X x (not from a hypothetical one without error). 

Though, for a particular situation, these cor- 
rections may not be demanded, it is only after 
the student has thought of them in turn and de- 
cided that they are inappropriate (as the a cor- 
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rection) , or small (as the c, s, or nonlinear 
adjustment) for his problem that he should refrain 
from making them, if it is possible to make them. 
Not infrequently the logic of the problem calls 
for a correction for attenuation but the requisite 
data for making the correction are lacking. In 
such instance a definitely expressed reservation 
in interpretation is called for. 

An important classification of correlation 
procedures depends upon the type of variable in- 
volved, as listed in Table X B. 

TABLE X B 

TYPES OF AVAILABLE VARIABLES 

(a) Two-category variable 

(a-l ) Values important as given, e.g., male and 
female. 

(a-2) Values are crude measures of an underlying 
continuous trait, e. g. , above-average and 
bel ow-average intelligence. 

(b) Categorical variable, consisting of several values 
which are not known to be quantitatively related, 
e.g., nationalities. 

(c) Ordered variable, cons i st i ng of several values 
which may be placed in an order of magnitude, but 
not quantitatively expressed, e.g., a number of 
people ranked in order for sociability. 

(d) Quantitative variable, e.g., height. 

The field is fairly adequately covered by 
Tables X A and X B, but neither is inclusive of 
all the possible situations that should be dis- 
tinguished because of the statistical consequences 
that follow. Even so, when the various cor- 
rections are combined with the various types of 
variables there result over a hundred easily dis- 
tinguishable and definable correlation situations. 
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Of all these situations the most fundamental 
one is that which requires no a correction be- 
cause the variables are fully meaningful as they 
stand, no c correction because each variable is 
recorded in 12 or more equally spaced intervals, 
no s correction because the size of the sample is 
25 or greater, and no adjustment for nonlinearity. 

The data of Francis Gal ton upon the relation- 
ship of height of offspring and of parents, which 
meet all these conditions (though Gal ton used 
fewer classes than 12), will be used in the next 
two Sections for illustrative purposes. At the 
same time, a noting of Galton’s study provides a 
striking illustration of scientific induction. 

SECTION 2. THE PROBLEM OF CONCOMITANT VARIATION 
IN THE SCIENCES 

A problem common to all sciences is that of 
defining and determining the cause of concomitant 
variation. In this the physical sciences have a 
great advantage over the biological and social 
sciences in that (1) errors of observation and 
measurement are usually very small in comparison 
with the measurements involved and (2) fewer 
factors are ordinarily present. In measuring 
some intellectual capacity of a group of children, 
it usually happens that the standard errors of 
the test scores obtained are greater than half 
the standard deviation of the scores of the group. 
Obviously any relationship between two capacities, 
each measured with no greater reliability than 
this, will be clouded by the errors of measure- 
ment. This is serious, but it is not the only 
difficulty. In measuring the effect of gravity, 
physicists can ordinarily assume that 10 pounds 
of lead and 10 pounds of iron will act in a simi- 
lar manner, but in measuring intellect, food 
prices, etc. , to say that one reagent, one com- 
modity, etc. , is equivalent to another with re- 
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spect to the function being examined, is usually 
questionable. Accordingly, whereas the investi- 
gations of physics lead to the establishment of 
"laws”, those of the social sciences ordinarily 
lead to the discovery of "tendencies". Relation- 
ships between two psychological, biological, or 
social factors frequently depend upon a number of 
causes, each more or less independent, and no one 
of such importance as to dominate the situation. 
Under these conditions, the relationship tends 
to be linear. In other cases, where the true rela- 
tionship is nonlinear, large errors of measure- 
ment, will lessen the strength of the measurable 
relationship, thereby making it more difficult 
to determine the exact nature of whatever curvi- 
linear relationship may exist. It is also true 
that relationships which are intrinsically curvi- 
linear when determined over a range of the two 
variables from very low to very high, may show 
practically linear relationship throughout a 
short stretch of the range. For all the reasons 
stated, a measure of relationship based upon the 
assumption of linearity is of great importance. 
Even in the case of known nonlinear relationship 
it is of much value as a point of departure. The 
equation for estimating one variable from a know- 
ledge of another linearly related to it is called 
a linear regression equation [10:01], and in 
general the best measure of mutual linear rela- 
tionship is the Pearson product-moment correla- 
tion coefficient. 

Fundamental properties of this measure of re- 
lationship were discovered and presented graphi- 
cally by Francis Galton (1877 to 1888). Galton’s 
investigations had to do with the inheritance of 
traits, and certain of the terms which he used 
would hardly have arisen if the development had 
involved other data. For example, the symbol ”r " 
was a measure of "reversion", such, for example, 
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as offspring upon mid-parent (a mid-parent meas- 
ure is an average of the measures of father and 
mother). Later, Galton used the terms "re- 
gression” and "co-relation" and called the measure 
the "index of co-relation* " Weldon very properly 
calls this measure "Galton* s function" and Edg- 
worth, in 1892, gave it the name which has sur- 
vived, "coefficient of correlation* " Pearson 
(1920) has pointed out that the product-moment 
function of Bravais bears but a resemblance in 
form to the product-moment coefficient of corre- 
lation* Whereas Bravais started with observations 
which were assumed to be independent and obtained 
derived measures whose product moments did not 
equal zero, Galton started with the epic-making 
concept that the original measures were dependent. 
Partial correlation analysis leads to independent 
measures, having given related original measures, 
which is exactly the reverse of the Bravais or 
Gaussian developments. Galton seems deserving of 
being called the father of correlation, though so 
ubiquitous a phenomenon as concomitant variation 
in the natural and social sciences did not com- 
pletely escape his predecessors and it has been 
rediscovered by many modern students working in 
fields remote from those in which Galton labored. 

section 3 . FINDINGS RESULTING 
FROM GALTON* S GRAPHIC TREATMENT 

Galton* s procedure, based upon medians and 
quartile deviations, has given way to the more 
accurate one involving the product-moment formula 
based upon deviations from means and standard 
deviations, as developed by Karl Pearson. 

Zx 1 x 2 Ixy 5X x X 2 -NMjM 2 

r = r 12 = = = [10:07] 

flb-1^2 Na lC r 2 'J&ffiJsXptiT 

Equivalent formulas for Pearson product-moment correlation 
coefficient (See also (10:27) and (10:29) 
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We shall use Gal ton’ s data in deriving a meas- 
ure of correlation. Galton obtained the heights 
of parents and the heights of children, and drew 
up a "correlation" table or "scatter diagram" 
showing the relationship between the two. All 
female heights were multiplied by 1.08 to make 
them comparable with male heights. This is not 
the soundest procedure, but in this problem leads 
to no material error. Letting X lt X 2 represent 
male and female heights, cr 1 , a 2 their standard 
deviations, and M lt I 2 their means, it would have 
been better to have reduced each female height 
to a comparable male height by the equation 

Comparable male height = ^ i + (X 2 ~M 2 )-^- [10:08] 


The di 


scussion which follows will assume that 
CHART X I 

CORRELATION BETWEEN HEIGHTS 
OF MIDPARENTS AND OFFSPRING 

HEIGHTS OF ADULT CHILDREN EXPRESSED AS DEVIATIONS 
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the more reliable method of transforming ( "trans- 
muting" according to Galton) female into male 
heights was followed and also that means were 
used throughout- Presumably Galton used medians, 
but no fundamental difference in treatment fol- 
lowed from such use, it simply being a slightly 
less reliable procedure- Gal ton* s diagram con- 
tained the data given in the accompanying corre- 
lation table or scatter diagram, Chart X I. 
Deviations being measured from 68.25 inches, 
which is a small fraction of an inch away from 
the actual means, are labeled g and £ instead of 
x and y, and account of this slight difference 
is taken in Section 5. From just such data as 
given, — in fact, it is likely that these identi- 
cal data were involved, — Galton inferred certain 
relationships which we now know hold with every 
normal correlation surface [formula 10:10]. 

(a) A plot of the means of the horizontal ar- 
rays (rows) as shown by the x* s shows the "re* 
version" of offspring upon height of mid- parent. 
Thus if the mid-parent height is 2.50 above the 
mean, the average or most probable height of 
offspring is 1.25 inches above the mean. 

(b) The line connecting these means may be 
closely represented by a straight line through 
the origin, or intersection, of the means of the 
two distributions. This is the line showing the 
regression (or "reversion”) of offspring upon 
mid-parent. 

( c) There is a reversion or regression of mid- 
parent upon offspring. This would be represented 
by a straight line passing approximately through 
the o’ s. Thus for every correlation table there 
are two regression lines. 

(d) The slopes of these two lines are equal, 
provided the standard deviations of the two dis- 
tributions are equal. 

(e) If the standard deviations are equal this 
slope varies between zero and one (Galton did 
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not suggest the existence of negative correla- 
tions), and may be represented by the symbol "r, " 
( f) The standard deviations of the measures 
found in successive arrays (successive rows, or 
successive columns) are approximately equal and 
are smaller than the standard deviations of the 
total distribution, so that, if cr 2 equals the 
standard deviation of the heights of offspring 
and o" 2#1 equals the standard deviation of off- 
spring corresponding to given heights of mid- 
parent, then 

a 2 . 1 ~ ^2 (1 “ ^) 

where k is a positive quantity less than 1.0. 
Also, dealing with columns instead of rows, 

- 1. 2 = -1 (1 - M 

in which k is the same as before, c? 1 the standard 
deviation of heights of mid-parents, and cr lm 2 
the standard deviation of heights of mid-parents 
corresponding to given heights of offspring. 
These relationships, which Galton discovered in 
his data, will require re-examination and re- 
statement (see Chapter XI, Section 4) for less 
linear and less homoscedastic (less equal-varia- 
bility of arrays) data. 

(4) There is a simple relationship between k 
and r. It is 

k - r 2 , so that 

?!. I* -2 (I"' 2 ) 

°i. 2 = °i d~ r2 ) 

(h) Each array is approximately normal if the 


Variance of devia- 
tions from points on 
regression line [10:09] 

(see [lO : 2M-] and 
[l0:49]) 
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total distributions are normal. 

( i) If contour lines for different frequency 
densities are drawn in the diagram, they consti- 
tute a system of similar and similarly placed 
ellipses, the conjugate diameters of which are 
the two regression lines. 

Galton made no claim to mathematical ability, 
but through sheer insight into the phenomena of 
mutual implication made these penetrating obser- 
vations. He carried his conclusions, stated in 
probability terms, as to the nature of the cor- 
relation surface, to J. D. Hamilton Dickson ( 1886) , 
a mathematician, who readily wrote down the nor- 
mal correlation equation involving two variables. 
In the notation here used this is: 

f! “1 x\ 2 x x x 2 r 

y = exp (— + — ) [10:10] 

<l-n* x oJv^ 2(1 " r2) °i °2 *1*2 

Normal correlation surface, two variables 

Galton’ s humility, after years of collection 
of data and subtle analysis of the same, in the 
face of the neat but not involved mathematical 
derivation, is worthy of note by the social sci- 
entists of this day who scoff at mathematical 
analysis. Upon receiving the solution of his 
problem from Mr. Dickson, he wrote:* "I may be 
permitted to say that I never felt such a glow 
of loyalty and respect towards the sovereignty and 
magnificent sway of mathematical analysis as when 
this answer reached me, confirming, by purely 
mathematical reasoning, my various and laborious 
statistical conclusions with far more minuteness 
than I had dared to hope, for the original data 
ran somewhat roughly, and I had to smooth them 
with tender caution. " 


See Karl Pearson, NOTES, 1920. 
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SECTION 4. ALGEBRAIC STATEMENT OF GALTON* S GRAPHIC 
FINDINGS AND DERIVATION OF CORRELATION FORMULAS 


Let us consider these discoveries in detail. 
Let x x stand for the height of mid-parent, x 2 for 
that of offspring, each as a deviation from its 
mean. Let the variances be V 1 and V 2 and the 
standard deviations a 1 and cr 2 , while r is the 
slope of the regression line in the "reduced” 
scatter diagram, i. e. , in a correlation table in 
which the measures entered are x ± / cr l and x 2 /a 2 . 
Galton reduced by dividing by the quartile devia- 
tions, leading to essentially the same result as 
here. The slopes of the two regression lines are 
equal and equal to r . We will shortly obtain a 
value of r by means other than the graphic method 
of Galton. Finally, let x 2 stand for an estimated 
height of_offspring, knowing the mid-parent height 
and let x ± be an estimated height of mid-parent, 
knowing the offspring height. An alternative and 
more precise notation for x 2 is w bich may 
be read "x 2 dependent upon x x , n Similarly, an 
alternative notation for x x is x 1 ^ 2 . With this 
notation, discoveries (a) and (b) together are 
equivalent to 



Fundamental form of 
regression equation 


[see 10:05] 


Propositions (c) and (d) are equivalent to the 
addition of the second fundamental regression 
equation, as follows: 





Proposition (e) is liable of misinterpretation. 
If r=0, it asserts that there is no relationship 
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between the variables, — no regression of one 
variable upon the other, — while r-1 means com- 
plete mutual implication. More loosely stated, 
this latter situation will be described as one of 
complete dependence. So far as the data are con- 
cerned there is no evidence that the heights of 
parents have any more to do in causing the heights 
of offspring than do the heights of offspring in 
causing the heights of the parents. This is 
still true should r=l. A situation exists and a 
correlation coefficient measures the tendency of 
the paired observations to be related, but it 
gives no evidence whether x x is the cause of x 2 , 
x 2 the cause of x^ or whether the cause is un- 
known and lies back of both. We think of parents 
being causal agents in determining the heights of 
offspring, but we do this for reasons outside of 
the scatter diagram, namely, the parents have ex- 
isted earlier than the of fspring in a time series. 

Propositions ( f) and ( g) are the result of 
careful study of data, but Galton gave a simple 
proof of ( g ) . The variability of the offspring 
generation is consequent to the variability of 
the arrays (rows) and the variability of the 
means of these arrays. If x 2 Ai[ fc ^ e distance 

of the mean of the i-array of offspring heights 
from the mean of all offspring heights and, as 
before, V 2 ± is the variance of this as well as 
of each of" the other arrays, and if n x is the 
number of measures in this array, then (n,V 2<1 + 

n ) x 2Aij) contribution of the i-array in 
the calculation of the variance of the distri- 
bution, thus: 




L,V 2 , 

N 


in.x 


1 Al] 


An a S y si s of 
vari ance 


N 
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in which k over the summation sign indicates the 
number of terms in the summation, i.e., k is the 
number of classes into which the variable is 
grouped. The first term of the right-hand member 
yields 


k k 

N Vt ‘ 1 N 


[ 10 : 12 ] 


Since, for any array, 


X 2Al|' 
and since 


[10:13] 


2n| *i, 


2 = 


= lx; - N V, 


[10:14] 


we obtain for the second right-hand term of 
[10:11] V 2 £ lt the variance of the points on the 
regression line 

^! X 2Al, 


2Al 


= r 2 V 0 


Variance of 
est I mated * s 


[10:15] 


It is important to distinguish between this, the 
variance of the points on the regression line, 
and [10:50], the variance error of a point on 
the regression line. 

V 2 ^ 1 is read as "the variance of that part of 
x 2 that is dependent upon x 1# " 

Accordingly, we have 


^2. 1 + ^2Al 


[10:16] 


Total variance In terms of the variance of arrays and 
of the variance of points on linear regression line 


[10:19] DERIVATION 347 

V 2 = F 2>1 + r 2 F 2 ....... .[10:17] 

Transposing terms 


V 2 1 ~ F 2 ( l“r 2 ) Variance of an array [10:18] 


Also 


^. 2 = ^(1-r 2 ) [10:18a] 

This derivation proves Gal ton* s proposition (g) , 
that the reducing factor is equal to r 2 . 

(h) is an experimental finding which, coupled 
with ( g) and (a), ( b ), ( c ), an.d (d), quickly 
gives the equation of the normal bivariate cor- 
relation surface. 

A normal equation, with area 1.0, is given 
herewith [10:19] 

y= 7^ exp ^F tl0:19] 

Let any integral value of x include all cases 
with values between (x-. 5 ) and (x +.5). Further, 
y is the ordinate at the point x, so that y X 1 
is the area of a. rectangle with ordinate y and 
base from (x-. 5) to (x +.5). Thus, if the (in- 
terval/cr) is small, — and let us here employ such 
units of measurement that this is so, — then this 
rectangle closely approximates the curve for the 
region from (x— . 5) to (x +.5); and y is the proba- 
bility that a case drawn at random will have the 
value x. 

Referring to the two-dimensional scatter dia- 
gram, a similar statement for the probability 
that a measure in the x 2 array shall have the 
value x x is y la2 t — the ordinate for variable 
values of x ± for a fixed value of x ? . 
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1 . 2 



1 2 

-( x i- r a 2 x 2) 

exp 

2F 1 (l-r 2 ) 
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The probability that a measure will lie in the x 2 
array is 


^2 


1 

— z ex P 

cr 2 / 2rr 



[10:19] 


The probability of the joint occurrence, or that 
a case chosen at random will have the value x 2 
and the value x x is the product of these two proba- 
bilities, or y = y 2 y 1#2 , which reduces to 


. exp 


2ncr x a 2 /l “ r 2 2 (1-r 2 ) j ^ V x a x a. 


- 2 - 


X 1 X 2 



Normal bivariate distribution of volume 1*0 [10:21] 


Multiplying the right-hand member by N yields 
the normal bivariate distribution with number of 
cases equal to which Dickson wrote down to 
satisfy the experimental findings that Gal ton re- 
ported to him* 


section 5. DERIVATION OF COMPUTATION AL FORMULAS FOR b AND r 


The magnitude r has, to this point, been defined 
as the slope of the regression line in case stand- 
ard scores ( see [10:05] and [10: 05a] ) are the 
measures entered into the correlation table. We 
will now prove that in any scatter diagram, the 
two n best fit ” linear regression lines are, as 
recorded, [10:04] and [10:04a], in which r is 
readily computable by formula [10:29] and the b ’ s 
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by formulas [10:30] and [10:31]. 

The term "best fit" is used as in the method 
of least squares. A "best fit" determination is 
one in which the variance error of estimate is 
minimal. Determinations can be made resulting 
in the sum of the deviations, of the cubes of the 
deviations, of their fourth powers, etc., being 
a minimum, but since the days of Gauss, it has 
been known that in the case of a normal distri- 
bution none of these determinations will result 
in as small a median error as one in which the 
sum of the squares of the errors of estimate is 
made a minimum* The constants of distributions 
which are quite divergent from the normal, so 
determined that the variance error is minimal, 
are undoubtedly excellent determinations, but it 
is no longer possible to say that constants so 
calculated have smaller median errors than would 
others derived upon a different principle. In 
derivations herewith the Gaussian principle of 
least squares is adhered to. 

In Chart X I the regression line drawn is that 
of x 2 upon x x . Its equation is 

^2 

X 2 = r X 1 = b 21 X x [10:22] 

o-i 

The "slope" of the regression line is b 21 and 
equals tan <p. Having given a value of x ± , the 
best estimate of the corresponding x 2 is x 2 , — 
[10:22]. In general x 2 will not be identical 
with the actual or experimentally obtained value 
of x 2 , so that 

X 2 1 = X 2 — X 2 Error of estimate [10:23] 

is an error of estimate. The variance error of 
estimate is given by 
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Variance e r- 


<1 = ^2.1 


l(x 2 ~ X 2 ) z ror_°f es-t «- 


N 


mate (see 

[l0:09] and 
[lO: 4-9 J ) 


[10:24] 


The value of b 21 which makes this minimal is the 
value sought. 

N V 2.1 = Z( x 2~x 2 ) 2 = 2 Cx 2 - b 21 x l ) 2 


= lx 2 -2 b 21 + £>2 1 


= N V 2 ~ 2 m b 2 x C x 2 + A’ b\ x v x 


in which c 12 , called a covariance, or product 
moment, equals (2.x 1 x 2 )/N . Dividing by /V and 
completing the square on the last two terms, we 
have 
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= F 0 + 


'12 
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- 2 c,„ + b*V 


' 2 1 1 2 
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2 1 r 1 
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V t 


r l IL + cfil 


- Lo -, or,) 2 


21 ^1- 

0-1 

As the () term is squared, it can never be nega- 
tive, so obviously V 2ml takes its proper, or 
smallest, value when the () term, which is the 
only term containing b 21 , equals zero. Solving 


*>21 = 


V, 


By parity 
c 


b i2 


12 


1 IX-Xj 

= SX 1 X 2 

Regression coef- 
ficient of X 2 
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[10:26] 


2 2 
Regression coefficient of X ^ upon X 2 
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The correlation coefficient is either regres- 
sion coefficient, when standard scores are in- 
volved, since they are then equal. To show that 
this common value is r as given by [10:07] is 
left as an exercise. 

When b 21 t b 12 we can note that the sign of 
both b’ s is that of 1x 1 x 2 , so that 

r i2 = /b 12 b 2 i [10:27] 

with the sign attached to r that is the sign of 
b 12 or of b 2i . 

The formulas for r and b based upon deviation 
scores, x 1 and x 2 , are unsatisfactory for compu- 
tational purposes because x ± and x 2 would ordi- 
narily be continuing decimals. We require a 
method utilizing integral values corresponding 
to the midpoints of the classes into which the 
and X 2 scores have been grouped. To avoid a no- 
tation calling for subscripts of subscripts, we 
let X , x, and g have meanings for the X 1 variable 
as already defined [Ch. VI, Sec. 6]. The grouping 
interval in this first variable isj^. We let Y, 
y, £, and i 2 have corresponding meanings for the 
% 2 variable. Formulas for the evaluation of M lf 
V 1 , M 2 , and V 2 in terms of and £ scores and of 
i x and i 2 have already been given, so we only 
need to express c 12 in these terms to have a- 
vailable computational procedures based upon the 
convenient g and £ scores. 

N c 12 = 2xy = [(£“#£ )i 2 l 

= i 1 i 2 iZ££-M i l£~M^'+N M s M t ) 

C 12 = iili MgM f) [10:28] 

N 

Covariance from £ and £ scores 

Utilizing this with earlier formulas we have 
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12 


1 2 


■'21 


( 'X2 — A MmM> Work formula 

for r [10:29] 


u i r\ i wi mu i a 

for ^ [10:30] 


cr 1 or 2 

A or £ 

C i 2 

i 1 (IiC ~ A MgM c ) 


i 2 A V 

C 1 2 

i 2 (^ - A 


ii A 


for b 21 [10:31] 


Computing the correlation and regression con- 
stants for the Galton data by means of these 
formulas we have 

A = 324 

Arbitrary origin-,^ = 68.25, and = .5 
Arbitrary origin 2 = 68.25, and i 2 = .5 
38 


I, = 

* 324 


-20 


.1173, so that M 1 - 68.25 
+ . 5(. 1173) = 68.31 


V. - 

s 324 


itl v “ .UUii , oO LIlilL m 2 UO.Z,«J 

324 + .5 (-.0617) = 68.22 

2740 

- ( . 1173) 2 = 8. 4430, so 

K= 8.4430 ( . 5) 2 = 2.1108 


6320 

V r - — ~ (-. 0617) 2 = 19.5024, soV=4.8765 


£ 324 

(.5) (5) 


'12 


324 


and cr 2 - 2.2081 


[1618 -324(.1173)(-. 0617)3 


= 1.2503 
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4.8765 


= .2564 


2 1 


1. 2503 
2. 1108 


= . 5923 


1.2503 


12 (1. 4528) (2.2081) 


= .3897 


X 2 = 27.76 + .5923 X x (Substituting in 

[10:01a] ) 

The reliability and, accordingly, number of deci- 
mal places to be retained, of these correlation 
constants is discussed in this chapter, Sections 
6 and 7. 

The computation shown on Chart X I is straignt- 
forward, but it does not provide as adequate 
checks upon numerical accuracy as are desirable. 
Many forms have been devised to make the steps 
involved mechanical and such that nonstatisti- 
cally trained clerks can compute correlational 
constants rapidly and accurately. Some of these 
are especially designed for use with computing 
machines. The form given in Appendix C is an 
example of one that is serviceable either with 
or without a computing machine. It is a little 
longer than some of the forms, but it provides 
more complete checks than usual for summations 
are obtained in two independent ways. The steps 
to be followed will not be explained in detail 
as they are practically self-explanatory, but 
attention is called to the following facts: that 
s = € + £; that d = g - £; that the italic 
entries are the computational or manual entries; 
that "LL-UR diag" indicates that cells in a line 
from the lower-left to the upper-right all have 
the same d- values , so that summations of f re- 
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quencies along such a line give all the frequen- 
cies having the particular d value recorded at 
the end of this line at the bottom or top of the 
scatter diagram; that cells in a line from the 
upper-left to the lower right, tt UL-LR diag, " all 
have the same s-values, so that summations along 
such a line give all the frequencies having the 
particular s- value recorded at the right or 
the left of the scatter diagram; and the value 
printed in the lower right corner of each cell 
is the value of the product for that cell. 

SECTION 6. THE REGRESSION EQUATION AND THE 
ANALYSIS OF VARIANCE 

We have noted [10:16] that having two corre- 
lated variables, X Q and X lt the total variance 
V 0 is divisible into two independent parts, one 
being capable of being estimated from X ± and the 
other, V 0 * 19 being independent of X ±i thus 

^0 = s t)Ai + Vi • • • See [10: 16] 


U-l) = 1 + (N- 2) Degrees of freedom equation [10:32] 

The degrees of freedom of these three variances 
are (N~I ) , 1, and (AH2) and, as always , a degrees 
of freedom equation corresponding to a variance 
equation can be written. As shown in Chapter VI, 
the number of degrees of freedom of V Q is (/V-l). 
Clearly, haying a given set of X ± values the 
variance of X 0 (identical with the variance of 
x Q ) is consequent to one constant, b 01 , and thus 
has one degree of freedom. The residual vari- 
ance, V 0mlt must then have exactly (N~ 2) degrees 
of freedom. 

Or, we can determine this by noting that the 
residual deviations, X 0b1 , are those after two 
linear restrictions ( leading to M 0 and b 01 ) have 
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been placed upon the X Q measures. Since M 0 is 
not known a priori, but found from the sample, 
this constitutes planing the linear restriction, 

IX 0 = N M a [10:33] 

upon the X 0 ’ s. Since £> 01 is not known a priori, 
but found from the sample, this constitutes 
placing the added linear restriction, 

1X 0 X 1 =N M 0 M 1 +N V 1 b 01 . . . [10:34] 

upon the X 0 ’s. Having given, i. e. , fixed, values 
of X lt the summation ^X Q X 1 is of course linear in 
X 0 . When fitting parabolic regression lines 
further linear restrictions of the sort, 2X 0 X 2 ^ , 
ix 0 xl etc. , are involved. 

The sample at hand is to be conceived as one 

of an indefinitely large number, all of which 
have identically the same set of X 1 values. Thus 
the parent population varies only in having dif- 
ferent X 0 values from sample to sample. 

The two parameters whose distributions we are 
concerned with, should many samples be taken, 
are M 0 and b 01 , as indicated in [10:33] and 
[10:34]. Let the first hypothesis that we desire 
to test be that M 0 is a chance deviate from M 0 , 
and the second that £> 01 is a chance deviate from 

*\j 

£>oi* ^We make the first test under the assumption 
that b Q1 = b 01 and the second under the assump- 
tion that M o = M q . Otherwise expressed, we test 
the hypothesis that M 0 is a chance deviate from 
M 0 for the universe of samples having the regres- 
sion b 0 , and we test the hypothesis that £> 01 is 
a chance deviate from £> 01 for the universe of 
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samples haying the mean M 0 . Actual equation 
[10:35] and theoretical equation [10:36] are the 
basis for the first test and equations [10:35] 
and [10:37] are the basis for the second* 

*0 ~ M o ~ b o i x i + x o . i [10: 35] 


X 0 -Jf 0 


h o 1 X 1 


+ x 


0 . 1 


Hypot hes 1 s when 


m: 


Is being 
t estea 


[10:36] 


X 0 -M 0 


1 X 1 


+ x' 


0 . 1 


Hypothesis when 
jb Q1 Is being 


tested 


[10:37] 


Subtracting [10:36] from [10:35] and transposing 
terms, we obtain 

r\j 

*0.1 = (W + X 0.1 

x Q>1 is divided into two independent parts and 
for the first sample we have 


in which (M Q -M Q ) 2 has one degree of freedom and 
V 0 1 j_ has (N- 2), so we obtain the variance ratio 


F 


l 


CN-2) 


(M 0 -M 0 ) 2 (N-2) 

C _____ 

v 0 a-r 2 01 ) 


F to test hypo- 
t hes is t rue mean 

~1Hq when true [10: 38] 
regression “ jb Q1 


Utilizing [10:35] and [10:37] we similarly obtain 
[^oi“&oi5 2 VJ (N-2) 


1 C N - 2 ) 


7 o^ 1_r oi) 


[10:39] 


F to test hypothesis true regression = b Q1 
= Mr 


when true mean 
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The qualifications next to. formula numbers [10:38] 
and [10:39] "when true regression = b 0 i" and 
"when true mean - M 0 " may ordinarily be con- 
sidered immaterial because in general the corre- 
lation between mean and regression coefficient 
approximates zero. 

The square root of F 1 ( m- 2 ) "Student’s t n 
and if r < .90 and N >25 (in fact, for most pur- 
poses if N > 12) it may be interpreted as a de- 
viate in a unit normal distribution. 

The one-third-sigma rule for the number of 
decimal places to keep when publishing, given in 
Chapter VI, Section 7, is based upon the value 
of the standard error of the item in question. 
The square root of a variance ratio is a critical 
ratio, so from [10:38] we note that 


Ml~r 0 2 i) 


=' 

o N-2 


Variance error of deter- [10*40] 
mined from correlated data 


From this we obtain the standard error and apply 
the one-third-sigma rule. For the Galton data, 


2. 1108(1-. 3897 2 ) 
322 


.00559; cr = .075; 

M i 


and = .02. 

3 

We accordingly publish M 1 as 68.31. 
From [10:39] we note that 


o 1 


F 0 ( 1-r o ! ) 
V^N-2) 


Variance error of 
cf [13:1^3] 


[10:41] 


This, with formulas [6:58] and [6:60] provides 
the basis for expressing published results as 
herewith: 
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.26 


= 68.31; V 1 = 2.11; = 1.45; b 12 = 

M z ~ 68. 22; F 2 = 4.9; cr 2 = 2.21; b 21 = .59; 


and f 2 2 • 09 . 

section 7. THE TRUSTWORTHINESS OF A CORRELATION COEFFICIENT 

The [] term in [10:39] = [ r oi^o~ true ( r oi cr 0 )] 2 . 
If, instead of assigning a hypothetical value 
to the product of r 01 and a Q , we assign such a 
value to r Q1 and consider all samples having the 
standard deviation <r 0 , then the right-hand member 
of [10:39] becomes 

Cr 01 “ r 01 ) 2 /(N-2) 


which, at first sight, looks as though it provided 
an F 1 (n _ 2) test for r Q1 , but this is not so be- 
cause, when fixing the value of cr 0 , we have imposed 
the condition = NV Q and this is not linear 
in the x 0 7 s. 

However, in the special case in which the hy- 
pothesis is that r 0 x = 0, there are such cancel- 
lations that [10: 39] , or the similar expression, 
interchanging variables, becomes 


F 


1( N-2) 


r 2 01 (N-2) 

l-rli 


F to test hypothesis 
that true correlation 
= 0 


[10:42] 


and we have a sound test of the significance of 

We will mention three unequally excell en t 
procedures which are available when 1? 0 ± does not 
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equal zero. For the most precise evaluation for 
samples o £ 25 or smaller we may use David’s 
Tables of the ordinates and probability integral 
of the distribution of r in small samples (1938). 

A second method,, which has valuable combina- 
torial properties in addition to excellent pre- 
cision in the interpretation of single correlation 
coefficients, is the r-into-z technique of Fisher 
(1920-21).* Using "in" to designate a natural 
logarithm and "log" to designate one to the base 
10, the transformation is 

z =— [!n(l+r) - In(l-r) ] 

2 Fisher's [10:43] 

1 1 +r r ~ ' n1:o ” 7 

= — In t rans forma t ion 

2 1-r 

Since In a - 2.3025851 log a, we may write this 
using logarithms to the base 10: 

z = 1.1512925 [l 0 g(l+r) - log(l-r)] 

[10 : 43a] 

= 1.1512925 log— 

1-r 

Fisher has shown that z has the happy property 
of being very nearly normally distributed with a 
standard deviation which is a function of the 
size of the sample only: 

1 

<r 2 ” 7m 

/iPi 

I£ we postulate the value r, we apply the r- 
into-z transformation to both r and r, obtaining 

^Herein is a table of z for values of r,.and in Fisher, STATIS- 
TICAL METHODS FOR RESEARCH WORKERS (1925 et seq.)^ is a table 
of r for values of z. A more detailed table of this relation- 
ship is given in column "u" and "tanh u" of Table H of SMITH- 
SONIAN MATH. TABLES, Hyperbolic Functions U9H et seq.) 


Standard error for 
values of z 


[10 : 44] 
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z and z. The critical ratio that concerns us is 

'■V' 

Z-Z ^ , 

- ( Z~Z ) VN~ 3 Normally distributed 

1 critical ratio to [10:45] 

y/N=3 t6St tr_?) 


The third and older method is to obtain a 
critical ratio by dividing r by its standard er- 
ror. This standard error is frequently taken as 
- (l-r 2 )/V/v7 but an improved result is gotten if 
N is replaced by the number of degrees of free- 
dom, /V“2 (though it is true that one of the two 
degrees of freedom lost does not represent a 
linear restriction). 


_ l~r 2 
r ~ /PH2 


Standard error of a total 
correlation coefficient 


[10:46] 


Many derivations of a r have been made and with 
slightly different outcomes. Perhaps the best 
first approximation is 


- Izil 

Sf- 2-4. 5r 2 


[10:46a] 


which is obtained by combining two of Romanowsky’s 
formulas.* 

The critical ratio that now concerns us is 
r-r _ 

Non-normally dlstrlbu- 

“ ted critical ratio to [10:47] 

o~ r 1-r 2 test (r-r") 

If N is large (say N >15) or r small (say r <.9), 


U. Romanowsky, On the moments of standard deviation and of 
correlation coefficients In samples from a normal popula- 
tion M ETR0N , Vo I • 5 , no. M-, 31-Xll, 1925,-formu l as fl21J and 

[12 4 J 
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and particularly if both of these are true, 
[10:47] will give very serviceable results if we 
assume r to benormally distributed, but otherwise 
the lack of knowledge of the form of distribution 
of [10:47] is serious. 

For the Galton data, where variables were 
labeled X ± and X 2 , we can test the hypothesis 
that there is zero correlation between mid-parent 
height by [10:42] 

F = ( * 3897 ) 2 322 _ 

1, 3 22 1 - ( .3897) 2 57,67 

This is so large a squared critical ratio that 
there is negligible probability that it could 
have arisen as a matter of chance, thus- disproving 
the hypothesis. 

Suppose some theory of genetics asserts that 
the correlation should be 4/9 and we desire to 
test this hypothesis. A test of ['.3897 - .4444] 
would not take cognizance of the fact that the 
4/9 is a value assuming no grouping error, where- 
as the .3897 has involved standard deviations in 
which, according to Sheppard's correction for 
coarseness of grouping [6:49], there is an ap- 
preciable error. We could correct the observed 
correlation coefficient for coarseness of group- 
ing and then compute F, but this would not be 
precise for the entire theory of F and its distri- 
bution involved no such procedure. We must there- 
fore keep the observed value . 3897 exactly as it 
is and alter the theoretical value, 4/9, by an 
amount which compensates for the coarseness of 
grouping. Sheppard's correct ion for the variance 
is 

*2 -2 

1 _ i v Sheppard's correction to 

c V - V - — - U(l— — ") V for coarseness of 

±Z IZv grouping [see 6 : M- 9 J 


in which V is the observed value and i the size 
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of the grouping interval of the X’ s* If we take 
♦the reducing factor [l~(i 2 )/ ( 12 V“) ] as a service- 
able estimate of that to apply to the theoretical 
situation, we have 


'v 2 

r 

c l 12 


r 2 

C 12 


c 2 

C 12 


V V 

c Y 1 c ¥ 2 ~ 


* 2 *2 

if 1 , 


1 ~ J -2 
Ml )Mi - — ) 


12F X ' 2 


12F, 


yielding the theoretical r 2 
for coarseness of grouping, 


0^2 


■ 1 2 


C 12 


F, F, 


— ^2 
-C r !2 


( 1 - 


12F, 


value, stepped down 

UO: 48] 


For the Gal ton data i ± - i 2 - 1.0 (not .5 as was 
the case in the computation involving £ and £); 
V x = 2. 1108; F 2 = 4. 8765; and c r 12 = 4/9. We ac- 
cordingly obtain r 12 =.4318, and the difference 
that concerns us is (r 12 “r 12 ) ” (- 3897 4318) . 
Transforming r and r into z and 2 ?, we obtain, 
z = .4114, and z = .4621, so that the [10:45] 
critical ratio is (.4114 - . 462 1 )//321, which 
= - .9084. The probability that a positive or 
negative difference as great as this would arise 
as a matter of chance if r is a chance deviate 
from r is .36. Thus the data are in excellent 
conformity with the hypothesis c r (i.e., r with- 
out a grouping error) = 4/9. 

By [10:47] the critical ratio is “.8907 and 
the corresponding P assuming normality is .37, 
which though in error is not sufficiently so to 
lead to a different practical conclusion. 
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section 8. AVERAGING CORRELATION COEFFICIENTS 

To illustrate the combinatorial excellence of 
z from r let us assume that a number of determi- 
nations of the correlation between mid-parent 
height and offspring height were made and that, 
except for the numbers of cases involved, they 
were equally trustworthy. Assume the results to 
be 

an r of -3897 from a sample of 324 
an r of .3542 from a sample of 156 

an r of -4421 from a sample of 13 

An r based upon all the data is desired. The 
procedure is to transform each r into a z, weight 
these inversely as their variance errors, and 
average. Thus 

r = -3897 =0= z - .41 14, wh Ich is to be weighted 321 

r = .3542 =0= z - .3702, which is to be weighted 153 

r = .4421 =C= z = .4748, which is to be weighted 10 

_ 321 (.4114) + 153 (.3702) + 10 (.4748) 

Mean z 

321 + 153 + 10 

= .3997 =0= r = .3797. 

This mean z and equivalent r are as reliable as 
if computed from a sample of 487, which = 321 
+ 153 +10+3. 

SECTION 9. THE TRUSTWORTHINESS OF A 
POINT ON THE REGRESSION LINE 

The regression [10:01] enables us to estimate 
X Q knowing X x . In this case the error of esti- 
mate is (X 0 ~X Q ) and the variance error is 

Variance of the errors of estimate 

V Q ^ “ V 0 (l~r 2 01 ) when Xq is estimated from X^, — see 

[10:09] and [l0:2>+] 
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The square root of this is the standard error of 
estimate. It is the crucial measure giving the 
precision with which we can estimate one variable 
from a knowledge of another. Since the quanti- 
ties (X 0 ~X 0 ) are "errors" we may regularly assume 
them to be normally distributed with a mean of 
zero and a standard deviation of 

Standard error 
__ of estimate 

a 0# i = VV Q “ ~ = a 0 VI -r^ (See [l0:09]and [10:49] 

[10:24-]) 

If our concern is not with individual X Q 
scores, but with the difference between the re- 
gression line found and the true regression line, 
then we are interested in the variance errors of 
M 0 and b Q1 as already discussed. We may also be 
interested in the trustworthiness of some speci- 
fic X Q corresponding to a specific X ± , The vari- 
ance error of thi s has been given by Pearson 
(1913 Freq. ) and, with minor modi fi cation of 
changing N to If - 2, is 

V ( X °) [l+ “V ] »0:50] 

Variance error of a point on a regression t ? rre when 
distributions of arrays are similar 

We can illustrate the use of this formula by 
citing a problem arising in the "standardization” 
of educational or psychological tests . Let X 1 
be the score on a comprehensive and highly relia- 
ble* psychological or educational test which is 
applicable to a wide range of talent, which we 
will call the "anchor test" and for which adequate 

* If not highly reliable, a modification of the procedure here 
outlined, incorporating the reliability of the X^ scores is 
necessary. 
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"norms " exist. Let X Q be a score upon a new test 
in the process o£ being "standardized, " or having 
norms established. The correlation between it 
and X ± for a sample of N has been obtained. We 
desire to tie the new test to the anchor test 
and secure a set of parallel scores such that we 
can assert ^ oAla is the norm for those of ability 
X ia ; X oAlb for those of ability X ± b ; etc. Intro- 
ducing this_ value X la into [10:01] and noting 
that X oAl = X Q , yields X oAla which is the desired 
parallel score, or equivalent score, to X ± a . 
Substituting X 1 a for X ± in [10:50] gives the 
variance error of this equivalent score. 

Clearly, if r Q1 is high and X 1& does not dif- 
fer greatly from M 1 , we obtain X qAi a with high 
precision even though N is rather small. This 
is thus a means, at moderate testing cost, of 
establishing the norms of a new test by tying 
them to those of highly reliable previous measure. 

SECTION 10. CORRELATION BETWEEN RANKS 

A simple formula for the correlation between 
ranks is readily derived from the product-moment 
formula by expressing the covariance as a func- 
tion of the variance of the differences of paired 
measures x 1 , x 2 . 

V(x 1 ~x 2 ) =V 1 + V 2 -2c 12 =V 1 +V 2 -2a 1 cr 2 ri 2 [10:51] 
Thus 


r 


12 


V 1 +V 2 ~V(x 1 ~x 2 ) 


2 cr 1 cr 2 


Difference formu- 
la for product 
moment T 


[10:52] 


We record, as being at times convenient, a simi- 
larly derived formula based upon the variance of 
the sum of paired measures; 


366 


CORRELATION 


[10:53] 


r 


1 2 


V(x 1 +x 2 )-V 1 -v 2 


2 o” 2 


Sum f ormu la [10:53] 

for product 
moment r 


or combining these two 


12 


V(X 1 + X 2 )~V(X 1 ~X 2 ) Sum and d i f- 

ferance for-[ 10: 54] 

mu I a for 

product moment T 


4 o-j. 0- 2 


If, as in the case of ranked data involving no 
missing ranks from 1 to A , we have V 1 " F 2 , for- 
mula [10:52] simplifies. From [14:128] giving 
the sum of A numbers, 1, 2, 3, . . . , A, and of 
their squares, we have, if X stands for the suc- 
cessive ranks 1, 2, 3 . . . , A, 


IX =- 


a __ aca+ i: 


so that 


=- 


A + 1 


A1 


SO 


2X 2 


The mean of A ranks 


A 3 A 2 A A(2A+ 1 )(N + 1 0 


[10:55] 


so that 


Mean X 


2 _ (N+1)(2N+1) 


[10:56] 


V = (mean X 


2 - if 2 ) = 


a 2 -i 

12 


The varl- r _ r „ 
ance of A [10 • 57 ] 

ranks 


Let d stand for the differences in paired ranks, 
i.e., for (x 1 ~x 2 ) of formula [10:52] when the 
scores in question are ranks, and let p 12 be the 
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product moment correlation between ranks, then, 
utilizing [10:52] and [10:57] 


P i: 


= 1 


6 Id 2 


N(N Z - 1) 


Spearman rank cor- 
relation coeffi- [10:58] 
c I ent 


This formula must not be confused with Spear- 
man's "foot-rule” rank correlation formula, which, 
though simpler, has such shortcomings as never 
to be preferred to [10:58]. 

The coefficient p 12 is exactly a product 
moment correlation coefficient, when scores are 
ranks, and if used in a regression equation in 
which the predictor variable is expressed as a 
rank, it should not be altered in any way. In a 
theoretical treatment, wherein the assumption is 
made that the real distributions underlying the 
ranks are normal, Pearson (1907) gives a formula 
for estimating from p the r that would have been 
obtained had the continuous normally distributed 
variables been used. It is 


77 

2 sin — p 
6 


Pearson’s correction to 

Spearman's p [10:59] 


This correction is small and in general it suf- 
fices to report p and its standard error. The 
standard error of p and of r-from-p have been 
given by Pearson and are, after first substituting 
(N-2) for N, to allow for the loss of two degrees 
of freedom, as herewith 




.1 V 

ViPI 


(1 + .086 p z +.013 p* +.002 p 6 ) 

Standard error of Spearman's p 

l-r 2 


[10:60] 


.(1 + .042r z + .008 r* + .002 r°) 


by = 1.0472 

vN-2 

Standard error of T from p [10:61] 

If one of the paired measures is a normally 
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distributed variate and the other a series of 
ranks, though the underlying variable is in fact 
normally distributed, and if the product moment 
correlation between them is called p ' , Pearson 
(1914) gives the following formula to estimate 
the correlation had the normally distributed 
scores been available: 

r =Jf P ' = 1.0233 p‘ [10:62] 


Correction in p for ties in ranks. It fre- 
quently occurs that two or more ranks are tied. 
In this case all of the tied ranks are given a 
single value, which is the average of the ranks 
represented by the tied measures. For example, 
the 11 scores 


10, 9, 8, 8, 7, 6, 6, 6, 5, 4, 4 

are given rank values 

1, 2, 3.5, 3.5, 5, 7, 7, 7, 9, 10.5, 10,5 


The mean of these is exactly the value given by 
[10:55], but mean x 2 for these is not identical 
with the answer given by [10:56], and F is not 
identical with the value given by [10:57]. Horn 
(1942) has noted that the correction in [10:57] 
is simple and has given tables and approximations 
to facilitate computation when there are di fferent 
numbers of ranks in the ties. 

We note that if a number of ranks X, X+l, X+2, 

. . . , X+k are replaced by thei r average, X+.5k, 
and if their squares X 2 , (X+l) 2 , . . . (X+k) 2 are 
replaced by (X+. 5k) 2 , (X+.5 k) 2 , . . . , (X+.5 k) 2 , 
the sum of the latter squares is less than the 
former by an amount which is equal to k times the 
variance of k ranks, thus 


x=x+k „ k k(k 2 -l) k(k 2 -\) 

2 X 2 =STX+.5 k) 2 +— - ^k(X+.5k) 2 + 

x - x 12 


12 


12 

[10:63] 
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We may rewrite [10:58] thus 
Id 2 

Pl2 ~ 1 ~ ] — - i 

V 12 v 12 


2/viT ^ 


Herein the ^ s are given by [10:57], but if there 
are tied ranks, NV 1 is too large by an amount 
equal to the sum of all the functions k(k 2 - 1 )/12 
for all the ties in the first series and similarly 
NV 2 is too large by the sum of all the functions 
[k(k 2 ~l ) / 1 2 ] for the second series* To illus- 
trate in the case of the 11 ranks just given: 
the sum of the squares of the measures 1, 2, 3, 5, 
3*5, 5, 7, 7, 7, 9, 10.5, 10.5 is equal to the 
sum of the squares of 1, 2, 3, 4, 5, 6, 7, 8, 9, 
10, 11 minus { [2 (2 2 - 1)/12] + [3(3 2 - 1)/12] + 
[2(2 2 — 1)/12] }, because of ties 3.5, 3.5 and 7, 7, 
7 and 10.5, 10.5. The deduction is 3( = . 5+2+. 5) . 
We accordingly, in this instance, deduct 3 and set 

NV = 1J (1I - 2 ~ i l - 3 = 107 
1 12 

Of course, a simi lar correction would be necessary 
for NV 2 so that the formula for p becomes 

6 Id 2 

2 - 1 ~ — — ~ — — 

AW- iy-skjkl^i) M¥-i)-sk 2 (k\ -i) 

p corrected for ties in ranks [ 10 : 64 ] 

Herewith is a short table of k(k 2 ~ 1) values 
for. different numbers of ties in ranks: 

TABLE X C 

CORRECTIONS WHEN COMPUTING p FROM TIED RANKS 


k: No. of 
ranks t led 

/c(P-l) 



6 

7 

210 

336 
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ated variable and Y the two-category variable. 
With such data, in addition to the straightforward 
problems of estimating X knowing Y and of esti- 
mating Y knowing X, we may be interested in es- 
timating the correlation and the regressions 
maintaining between X and Y l , a continuous nor- 
mally distributed variable which has been forced 
into the dichotomy given by Y. 

The essential statistics obtained from a 
scatter diagram are shown in Chart X II. 

The units of measurement of X may well be 
those of the raw scores, grouped in not less than 
12 classes if desired. In the case of Y, simpli- 
city results if the lower score is called 0 and 
the higher score called 1. If the proportion of 
cases receiving a 0 score is q, then the constants 
of the Y distribution are: 


= P 


= pq 


, [10:65] 

....... [ 10 : 66 ] 

For the X series we compute 

*, ; K 

The number of cases is N and the covariance is 


N q Np 

2X^X0 + 2^X1 

N 


*y 


IXY 


-IP 


= pq(M a -Mt) ...... [10:67] 

in which the X^ measures are those paired with 
a zero Y score and the X y measures are those 
paired with a Y score of 1. Also is the mean 
of the X measures in the lower group and Jf u the 
mean of the X measures in the upper group. 
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x y 


y x 




Regression of continuous variable 
upon two-category variable 


[ 10 : 68 ] 


pq(M u ~M^) 


K 


Regression of two- category 

variable upon continuous [10:69] 

variable 


Vpq 


xy 




- Biser!al product moment f [10:70] 


X = (M U ~M^ Y + 


[10:71] 


<r^ u -%;pq 




x +p 


(\~ M 0 w 


** [10:72] 


The variance ratio test of the regression 
coefficient (M u ~ fy) is given by [10:39], and the 
variance ratio test of M x by [10:38]. 

If A is small, these tests, [10:39] and [10:38], 
for the regression (M {J ~M^)pq/V y[ and the mean, 
M (=p), should be applied with misgiving because 
the variances involved derive from two-point 
distributions. 

A test of alternative to [10:39] is 

the critical ratio test (M u -Mj^ )/ °u u ~m p > which 
squared is, u 

(M^) 2 

(See [10^3] 

"pi + ^ri 

If, in the population 1£ = 1 (^ , then approximately 

KK 14 
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and the just recorded squared critical ratio be- 


<X-%J> 2 PQ(N-2) 


K d-rly) 


which is exactly the [10:39] variance ratio test 
of the regression coefficient (¥ ) . Thus, if 
the data are homoscedastic ( equal variability of 
arrays under the definition of equal variability 
of 


V ( ) = V & — “) 

u X~r VN r v 

and not the older definition, V u - Fg ) , the 

[10:39] test is adequate, but if V u does not 
approximately equal Vj^ the squared critical ratio 
test [10:73] is the more precise. 

Estimates of biserial relationships , assuming 
the dichotomy is one forced upon an underlying 
normal variate . Under this assumption we can 
compute the correlation and the regression equa- 
tions, but as the assumption itself is always 
open to serious questioning, though how serious 
we cannot express quantitatively or in probability 
terms, a computation of the standard errors of 
these modified correlation and regression con- 
stants is objectionable. It could only lead to 
fiducial results in which the presumably major 
source of error (assumption of normality) was 
neglected while the minor source (small size of 
sample) was treated as though it were the sole 
source of error. 

If the Y measures recorded in the two-category 
manner had derived from a normal distribution 
having a mean of zero and a variance of one, the 
means of the lower and upper categories would be 
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~z/q and z/p respectively, in which z is the 
ordinate in a unit normal distribution at that 
point for which the proportion of cases to the 
left is q, and the proportion to the right is p, 
as given by [8:25] and [8:26]. Further, the re- 
gression of X upon this normal variate, which we 
designate y' (the same as xof tables of the unit 
normal distribution), will be a line passing 
through the intersection of and -z/q and the 
intersection of M u and z/p. In Chart X II, the 
sum of the absolute values of the distances x 2 
and x x equals ), and the sum of the absolute 

values of the distances y 2 and y x is, in terms of 
the unit normal distribution, 

z z z 

— + — which = , so that 

p q pq 


Regression of actual 

b i — continuous variate upon [10:74] 

Z assumed normal variate 


Of course V y » - 1, so that 

pg 

r i Bi serial r [10:75] 

Xy 


The designation of this as n biserial r" follows 
long established practice, but a more accurate 
designation would be "biserial r corrected for an 
enforced dichotomy upon a normal variate;" 

The regression equation of X upon y f serves 
no practical need since one does not actually 
have y* . The other regression is 


(*»-%) PQ 


r. * 


■cm.; 


y 


[10:76] 
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The variance error of estimate of y ' (assuming 
zero error in the assumption of normality of y' ) 


y,'.* =i - *-«V 


[10:77] 


The variance error of biserial r, as given by 
Soper (1914), is 

V r 3 I n - py jn I 

bIser - r N { 7 2 + 1 z 1 z ^ 


(biser. r) 2 + ( biser. r) 4, } 


[10:78] 


but of course this variance error does not have 
incorporated in it the error because of the 
assumption of normality. For dichotomies wherein 
q is not less than .05, a close approximation to 
[10.; 78] is 


[ — - -(biser.r) 2 ] 2 
z 


biser.r 


[10:79] 


A common practice in the statistical analysis 
of true-false, or right-wrong, test items has 
been to judge of their excellence by the size of 
the biserial correlations between them and a 
graduated criterion, or by the critical ratios 
biser. r/cr b 1 s e r r . In view of the fact that the 

continuous variable y* is not actually avail- 
able, nor presumably ever will be available for 
it is not contemplated that the future scoring 
of such an item will ever be other than "right” 
or "wrong”, and in view of the shortcomings of 
the formula for the standard error of biserial r , 


376 


CORRELATION 


410 : 79 ] 




it would seem better to employ biserial product 
moment r and the precise tests connected there- 
with. The hazards of employing biserial r in- 
stead of a real product moment r, such as is bi- 
serial product moment r, are increased if several 
items of a test are combined in a multiple re- 
gression equation to estimate a graduated cri- 
terion, (a) because the best weights, in the 
minimal variance error of estimate sense, derive 
from the product moment r* s and cannot be improved 
upon in any manner, except by resort to higher 
order, or curvilinear regression, and (b) because 
the precise tests existing for product moment 



multiple and partial correlation do not apply if 
correlation coefficients other than product moment 
coefficients are employed. The observations in 
Section 12, — tetrachoric r , — upon dichotomies 
and range of talent apply in biserial r situations. 

The writer has been kindly supplied with the 
data of Table X D, which arose in the experimental 
work of L. R. Waldron, State College Station, 
Fargo, North Dakota. 

TABLE X D 

CERTAIN BI SERIAL DATA 


Y^O Fn 
X No. of No. of 
cases cases 


Resulting constants 


42 
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Here the assumption of normality in the Y variate 
is so untenable as to lead to an absurd biserial 
r. Should a situation be only a little less ex- 
treme than this, it would not be revealed by 
yielding an impossible r value. The student 
should make the assumption of normality only 
after careful consideration and he should not 
expect an unsoundness in this respect to be auto- 
matically revealed by absurd statistical outcomes* 
It seems to the writer fairly satisfactory to 
use biserial r as a terminal statistic in con- 
nection with the data of Table X E drawn from 
psychological examination in the United States 
Army (see Yerkes 1921)* 


TABLE X E 

SCHOOL ATTAINMENT AND ARMY ALPHA SCORES 


SCORE IN ARMY ALPHA 
INTELLIGENCE TEST 

NUMBER OF MEN WHO LEFT SCHOOL 

BELOW THE 9TH GRADE 

ABOVE THE 8TH GRADE 

205-212 


s 

200-204 


3 

1 95-199 


14 

190-194 


17 

185-189 

r 

49 

180-184 

2 

54 

175-179 

8 

78 

170-174 

12 

126 

1 65-169 

1 8 

149 

1 60-164 

S 5 

200 

1 55- 1 59 

20 

244 

150-154 

45 

305 

145-149 

58 

352 

140-144 

7 * 

338 

135-139 

101 

407 

1 30-134 

145 

507 

125-129 

190 

528 

1 20-1 24 

2 1 6 

530 

1 15- 1 1 9 

3 17 

643 

1 1 0-1 1 4 

393 

674 
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TABLE X E 

(CONTINUED) 

SCHOOL ATTAINMENT AND ARMY ALPHA SCORES 


SCORE IN ARMY ALPHA „ 

NUMBER OF MEN WHO LEFT SCHOOL 

INTELLIGENCE TEST 

BELOW THE 9TH GRADE 

ABOVE THE 8TH GRADE 

105-109 

507 

682 

100-104 

582 

691 

95- 99 

76 1 

7 12 

90- 94 

908 

725 

85- 89 

993 

769 

80- 84 

1,181 

693 

75- 79 

1, 371 

642 

"sj 

0 

1 

-4 

rF 

1, 604 

648 

65- 69 

1, 709 

567 

60- 64 

1 , 962 

581 

55- 59 

2, 249 

430 

50- 54 

2, 272 

346 

45- 49 

2, 429 

305 

40- 44 

2, 455 

229 

35- 39 

2,473 

200 

30- 34 

2, 490 

154 

25- 29 

2, 213 

106 

20- 24 

1, 835 

60 

15- 19 

1,511 

42 

10- 14 

545 

13 

5- 9 

432 

5 

0- 4 

183 

34, 280 
13,822 

48, 102 

3 

13,822 


M u = 98.758 54.987 


cr x = 36.606 P = .28735 
z = .34083 

biserial product moment r = . 541 


biserial r = .718 
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Considerations in support of the utility of 
this biserial correlation are (a) the naive con- 
cept of the correlation between intelligence 
test score and schooling would assuredly involve 
the amount of schooling as a graduated variable, 
(b) the future actual treatment of schooling as 
a graduated variable is entirely possible, (c) 
the distribution of "highest school grade at- 
tended" is probably not radically non-normal, 
(d) the correlation is so high and the sample 
so large that a precise test of the significance 
of b j s'e r . ~0 ) is not needed, and (e) presuma- 
bly this statistic is a terminal statistic and 
will not be incorporated into further algebraic 
treatment. 

section 12 . TWO- BY-TWO-FOLD CORRELATION 

Phi: The issue of continuity which arose in 
connection with the Y variable in the case of 
biserial correlation, here arises in connection 
with both variables. We first ascertain the 
pertinent statistical constants for the actual 
point distributions as given by the data. In 
this case the product moment correlation has 
been known as Yule’s 0, and is here designated 
0. It is, in every detail, a rigorous product 
moment correlation coefficient, and the use of 
0 instead of r to designate it merely calls at- 
tention to the fact that it is computed from a 
two-by- two-fold. Its computation is simple and 
its use calls only for a certain reservation in 
the case of variance ratio or chi-square tests 
applied to very small samples. 

It has been customary to plot two-by- two-fold 
scatter diagrams as in Charts X III and X IV, 
so that the proportions p and p l are each equal 
to or greater than .5, even though this involves 
divergence from the usual scaling of abscissa 
from left to right and of ordinate from bottom 
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to top. The X and Y scales are indicated in 
Chart X IV. 

CHART X III 

TWO-BY-TWO-FOLD SCATTER DIAGRAM 


a 

b 

a+b 

i 

c 

d 

c+d 

0 

a+c 

b+d 

N 


i 

0 



CHART X IV 


TWO- BY-TW0-F0LD SCATTER DIAGRAM 
BASED UPON PROPORTIONS 




X 

a 

P 

P 

1 

y 

8 

9 

0 

p‘ 

q' 

i 


Y 1 

0 




a - a/N; fi = b/N; y - c/N; 8 = d/N; p = (a+b)/N; 
p‘ - (a+c)/N. It is readily established that 
= p; V x ~ P°* ; * v = P' > = P'd' ! and the co- 


variance 


x y 


ad-bc 


N 


<x8~/3ry 
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C *y ad~bc ah-By Product moment, 

0= = = CX . r in a two-by-[10:81] 

a x a y N2a * a y ^PQ P'<7' two_fold 


aS~/3y 

D And similarly for h 

* y p* q* y * 


[10:82] 


X ~ £> xy F + P - £> xy p' And similarly for Y [ 10 : 8 3] 


1, N - 2 


(p~p) (N~2 ) Variance ratio to 

pq(l-4> 2 ) test (P~P) 


[10:84] 


This may be compared with the critical ratio 
test for the mean, available when no second 
variable Y is utilized. The error variance, — 
pg(l“0 2 ), may be looked upon as the sum of (N- 2) 
independent variables arising from two-point dis- 
tributions. The tabled distributions of F, or 
of Student's t 2 , assume that these independent 
variances arise from normal distributions. For 
the usual situations in which N is, say, greater 
than 15, highly serviceable answers result from 
employing the usual P from F. 


(b xy ~'B^) 2 p’q’CN- 2) 


V a r i ance 


1 , N - 2 


* y 


at i o to [10:85] 

test (b ~~b ) 

' xy yy-^ 

in [10:85] are a 


pq (1-0) 2 

The p in [10:84] and the b 
priori values, or points of "reference in which 
interested. 

For samples of small size precise tests of 
significance, as illustrated in Chapter IX, Sec- 
tion 2 , are preferable to tests based upon [10:84] 
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and [10:853 . 

The variance error of 4>, as derived by Yule 
(see Pearson and Heron, 1913) is 




[ {—+—- 2 ) + {— +— 7 ~ 2 ) ] 

p q p q 


(q<p and q'<p' ) 


V a r ( ance er- 
ror of Cp from 
a two-by~two- 
f o i d table 
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Tetrachor ic correlation: This correlation 
coefficient is an estimate of the correlation to 
be expected in a parent normal bivariate distri- 
bution which, when forced into a two-by-t wo- fold, 
yielded the observed distribution* If an ap- 
proximately normal bivariate parent may reasona- 
bly be assumed to exist, if the issue is theore- 
tical or one concerning this parent and not a 
predictive problem which must use the available 
two-category measures, if a precise test of sig- 
nificance is not necessary because the sample is 
large and the magnitude of the correlation rather 
than its existence is the matter of importance, 
and if the correlation coefficient obtained is 
not to be used in further connections wherein a 
test of significance will be needed, then tetra- 
choric r will be a useful statistic. The writer 
has noted the frequent use of tetrachoric r in 
multiple regression equations in psychological 
research, but as these almost always lead to 
issues requiring tests of significance this prac- 
tice has serious shortcomings. 

Pearson (1900) derived equation [10:87]in 
which r t is the tetrachoric correlation coeffi- 
cient, q, pi and S are as indicated in Chart 
X IV, x and z are deviate and ordinate in a unit 
normal* distribution at the point of dichotomy 
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of the X variable and x' and z' the deviate and 
ordinate in a unit normal distribution at the 
point of dichotomy of the Y variable. 


S ~qq' 


r 2 r 3 

= r t +xx‘ + (x 2 -l)(x ,2 -l)- 


zz 


21 


3! 


, , r t 

+ (x 3 ~3x)(x' 3 -Zx' ) 

4! 


+ (x'*-6x 2 +3)(x' 4 ~6x' 2 +3)- 


5! 


+ Cx 5 -10x 3 t!5xXx' 5 -10x' 3 +15x'> 


6 ! 


+ (x 6 ~15x 4 H5x 2 ~15)(x' 6 -15x' 4 +45x ' 2 ~15 ) +••■ 

7! 

Equation giving the tetrachoric coefficient 

of correlation [10*87] 

To express the law governing successive coef- 
ficients of powers of r ± , let v n w n /n be the 
coefficient of r" , v n be a function of x, and w n 

a function of x‘ ; then v n may be expressed in 
terms of v’s of a lower order: 


V n = xv n _ 1 - Cn-l> n - 2 


and similarly 


w n = x'w„-i - (n-1 )w n 
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= 1, v l 


X 


and similarly 


W 0 = 1, W 1 = x' 

Thus the equation as written to the term may 
be continued to any number of additional terms 
desired should it not converge rapidly enough to 
make terms above the negligible. 

Extensive tables to facilitate the computation 
of r t have been computed by P. F. Everitt, Alice 
Lee, Margaret Moul, Ethel M. Elderton, A. E. R. 
Church, E. C. Fieller, and J. Pretorius. These 
are given in Pearson's Tables (1914). 

The most extensive and useful of these tables 
give the proportionate area for different dicho- 
tomies in both X and Y maintaining for correla- 
tions -.95, -.90, -.85, etc., to .95, 1.00. The 
interval in x and x' (designated h and k in the 
tables) is .la. Thus, having any x, x' , and S , 
one can find r t by interpolation between two 
successive tables. The tables have many other 
uses, including the determining of the propor- 
tionate frequency in a normal bivariate distri- 
bution in any desired rectangular interval (the 
sides of the rectangle being parallel to the 
axes) for any of the correlations -.95, -.90, 
..., 1 . 00 . 

Extensive diagrams for the computation of r t 
have been published by Chesire, Saffir, and 
Thurstone (1933). 

If an assumption of normality for one only of 
the variables seems reasonable, — the other vari- 
able being a genuine dichotomy, — the problem 
reduces to that of biserial r, the graduated 
variable now having but two values. These re- 
quire no correction for grouping for it is not 
postulated that any continuum underlies this 
variable. We will use the data of Table X F, also 
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drawn from Psychological Examining in the United 
States Army, to illustrate <t>, biserial r from a 
two-by- two- fold, and r t . 


TABLE X F 

ARMY ALPHA SCORES OF LIEUTENANTS IN 
MEDICAL AND NON-MEDICAL DEPARTMENTS 



Score on army intelli- 
gence alpha test 

A or B 

Below B 


First 

Lieutenants 

Departments other 
than Medical 

2940 

43 i 

3371 

Medical Department 

1799 

590 



4739 

1021 

5760 


TABLE X G 

FREQUENCIES OF TABLE X F EXPRESSED AS PROPORTIONS 


.5852 -p; .215215 -x; .389809 =z 

.4-148= 


.259915 = z' 


. 5 1 04 = a 

.0748 ~/3 

.3123 =y 

.1205 = 8 


.8277 -p' .1773 =q' 
.925704 =x' 


Substituting the values given in Table X G for 
S, x, x‘ , z, and z' in equation [10:87] and 
solving for r t by the method of successive ap- 
proximations given in Chapter XIV, Section 7, 
yields r t = .277. Computation of biserial r, 
assuming a normal distribution to underlie the 
intelligence test variable, yields r x ; = .195. 
Computation of the product moment biserial cor- 
relation yields <p = .154. 

To predict the department, knowing the dichot- 
omous army alpha score, calls for the use of <p. 
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The correlation is positive when non-medical de- 
partments are at the high end of the scale and 
the medical department at the low end. To answer 
the theoretical issue as to the indicated corre- 
lation if adequate X and Y variables were availa- 
ble, neither cfi nor r x / y is very satisfactory. 
r x # y accepts the point distribution of the "Medi- 
cal department-other departments" variable as 
reasonable, and this is certainly very question- 
able. <fi accepts the point nature of this varia- 
ble and also of the intelligence variable, so 
that it is doubly open 'to question. The writer 
is prone to put more confidence in r t as de- 
scriptive of the underlying relationship than in 
the other two measures, but cannot defend this 
statistically and would hesitate to attach a 
standard error to the obtained value .277. (The 
a T given by [10:88] is .016. 

t If the assumption of normality for both under- 
lying distributions is unquestionable, the follow- 
ing formula, due to Pearson (1913 Prob. ) , gives 
an approximate answer for the variance error of 
tetrachoric r : 


V(r t ) = — — - (l~r\ ) [1- ( Sin ~ ) 2 1 
* Nzz 90° 


[ 10 : 88 ] 


As we may, in general, believe that [10:88] 
understates the variance error, the result given 
by it may be taken as a lower estimate and when 
so taken we find that tetrachoric r is very un- 
reliable when dichotomies are extreme. 

If the two-by- two- fold is such that p-q-p ' ~q ' 
~.5, then [10:87] simplifies to 

Tetrachoric correlation 

r. = COS (277/3) when dichotomic lines [10:89] 
t are the medians 

This is a useful formula to use in the prelimin- 
ary investigation of paired graduated measures. 
Calling cases above the median +, for each vari- 
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able, and cases below the median ", then a simple 
count gives the number of pairings of unlike 
sign. This number divided by N equals 2/3, which 
is U, the proportion of unlike sign pairs, of 
formula [10:89a]. 

Unlike signs formu l a 

r t = co s ( tfU ) for r t [10:89a] 


We may use for the variance error in this case 


2rr a/3 


Variance error of 
when dichotomic lines 
are at the medians 


[10:90] 


In standardized test construction it is com- 
mon: (a) to use a preliminary form of a test 
having more items (scored right or wrong) than 
will be retained in the final form; (b) to give 
this preliminary form to an experimental group 
for the members of which there can be gotten a 
dichotomous criterion score (if the criterion 
score is graduated, it is frequently dichotomized 
at a point at or near the median) ; and (c) to 
appraise the merits of the separate items by 
their correlations with the criterion. Yule's <fr 
being a product moment correlation is, in the 
least squared error sense, the best measure of 
excellence and any other measure, such as is r t , 
is a less accurate statement of the relative ex- 
cellence of the item in its power to predict the 
dichotomous criterion score. However, r t has 
been widely used in this situation and though it 
has been, in the writer 9 s opinion, generally mis- 
used there nevertheless is a real argument in 
its favor when the experimental group is more 
homogeneous than the future population to which 
it is anticipated the test will be given. For 
example, consider an Air Corps flying aptitude 
test standardized upon the basis of Air Corps 
Cadets, but used later upon the more hetero- 
geneous group of Air Corps cadet candidates. An 
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easy item upon which, say, 90 per cent of cadets 
and 50 per cent of candidates pass, yields in the 
experimental group a considerably higher r t than 
<p. The 4> reveals the excellence of the item for 
such samples as the experimental group, whereas 
r t will in general be a better estimate than <f> 
of the excellence of the item for candidate 
groups. The general policy of using an item or 
a test for a different range of talent than that 
for which there is validation evidence can be 
cogently criticized, but such practice is at 
times unavoidable and logical considerations may 
suggest that r t , which is an estimate of a situ- 
ation other than that observed, may have prac- 
tical merit, in spite of many shortcomings, in- 
cluding the lack of any precise tests of signifi- 
cance. This latter lack is much more serious 
in factor analysis studies than in item analyses, 
so the use of r t ’ s in factor analysis seems to 
the writer to be inexpedient. 

section 13 . ADJUSTMENTS FOR COARSENESS OF GROUPING 

The problems that arise generally concern 
theoretical issues of correlation, but practical 
problems are occasionally present. 

We will call the measure being predicted the 
criterion, or the predictand, and the other mea- 
sure the predictor or independent variable. In 
connection with the following regression equation, 
Y is the prediction, Y the predictand, and X the 
predictor: 

Y = i X + M - b VY AT 

Obviously if X is now, and will continue in the 
future to be, available in a small number of 
classes only, there is no point in estimating the 
correlation, or the regression, that would main- 
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tain were X a finely graduated measure. In this 
case no correction for grouping, v as regards X, is 
serviceable. 

A similar argument does not always apply to 
the predictand. Suppose the criterion to be a 
two-category measure, "superior- inferior , " of 
medical efficiency. Though only two classes are 
available in the experimental study determining 
the relationship, one would ordinarily be more 
concerned with predicting degrees of efficiency 
than with predicting a two-category status. Thus 
for the Y variable we may make the most reasona- 
ble correction for grouping that is possible. 
This correction, in the case of a presumably 
normal distribution underlying a dichotomous cri- 
terion, is given by biserial r and the attendant 
regression equation. Even so, the procedure has 
the serious drawback that precise tests of sig- 
nificance and precise knowledge of the standard 
error of estimate are not available. However, if 
Y 1 is the continuous variable underlying the di- 
chotomous Y measures, the prediction equation is 
given by [10:76] and the best, though still 
questionable, estimate of the variance error of 
prediction is given by [10:78]. 

If the number of classes in the criterion is 
greater than 11, we seldom need to be concerned 
with a correction for grouping. 

If the number of classes in the criterion lies 
between 5 and 12, we have available one of two 
corrections depending upon whether the variable 
is of class indexes or of class means. 

Corrections when class indexes are the vari- 
able: We will illustrate this problem by the 
scatter diagrams of Table X H and X I. The pro- 
portionate frequencies listed in the cells are 
those of a normal bivariate distribution with the 
coarse groupings given. In each instance the 
correlation for the non-grouped data would be 
. 80, and the variance would be 1 . 
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TABLE X H 

NORMAL BIVARIATE DISTRIBUTION IN WHICH THE 
CORRELATION IS .80, IN CASE OF COARSE GROUPING 


.003306 .002168 


.00 0049 .003646 .<527520 .026076 


.000049 .006271 .079945 .127226 .027520 

.079945 .215708 .079945 .003646 1.0000 


.000719 .027520 .127226 


.000049 


.003646 .000049 


.002168 .003306 .000719 .000017 


INDEXES 

:. -2. -I. .0 I. 2. 3. 

CLASS 

MEANS 

1.8226 -1.8481 - .9206 .0 .9206 1.8481 2.8226 

TABLE X I 

NORMAL BIVARIATE DISTRIBUTION IN WHICH THE 




IS .80, 

IN CASE OF VERY COARSE 

00 005 6 

.060963 

.097637 

.060963 

.560760 

.060963 

.097637 

.060963 

.000056 

-2. 

-1.5251 

.0 

.0 

2. 

1.5251 





CLASS 

INDEXES 
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First consider the case in which the available 
variable is the class index. We will designate 
the variables given by the class indexes, y ] 
and Xj , and when a statistic is corrected for 
coarseness of grouping we so indicate by the 
preceding subscript c, the unavailable continuous 
variables by y and x, true statistics, (that is, 
of these unavailable variables) by a tilde, and 
when a statistic is corrected for coarseness of 
grouping we so indicate by the preceding sub- 
script c. The true statistics for both Table X H 
and X I are: 


= 1 ; 


V , = 1 ; 


S y = - 80 ; 


r«y = .80 


Table X H ( in which there is a slight error 
because the distributions have been assumed to 
terminate at ±3. 5cr yields: 


V = 1,08001; V '= 1.08001; 

y i x i 


c x = .79728; r = .73822 
x ?y i x i y i 

We note that there is negligible error in the 
covariance, but decided error in the variance 
and the correlation coefficient. In the case of 
a variable, such as a normal variable, having 
high order contact at both lower and upper ends, 
Sheppard’s correction for the variance [6:49] is 
highly serviceable. Applying it we obtain, 


c 



.99668; C F X ] 


.99668; r y = .79994 
c * j y j 


Let us conclude that, having high order contact, 
Sheppard’s correction to the variance will yield 
excellent approximations to true vari ances and 
correlations when the number of classes in each 
variable is six or greater. 
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No such excellent results are obtained with 
Table X I, where there are but three classes in 
each variable. We find (with a computational 
error because of terminating distributions at 
+ 3cr) that: 

V v =1.26924; V x =1.26924; c =.78065 

y , * t * i y ! 

',,>,=■61505; «V ri = .93591; ^ = .93591 

. = .83411 

1 J I 

These results suggest that there is only a small 
error in the covariance with very coarse grouping, 
but that the error in variance and correlation is 
serious in this instance and is not adequately 
allowed for by Sheppard’s correction* 

Corrections when class weans are the variable: 
The class means given in Tables X H and X I are 
computed by formula [8:27]. Let these, which we 
will indicate by the subscript m 9 constitute the 
variables, and let the preceding subscript c ' 
indicate a correction for coarseness of grouping. 
Computation from Table X H yields: 

V y =.92267; V =.92267; c v =.68662; r v =.74417 

y m m m y m m y m 

From Table X I we obtain, 

V =.73807; F x = .73807; c x = 45395; r x =.61505 

y m m m y m m y m 

Clearly corrections to variance, covariance, and 
correlation are demanded if the problem is such 
as to make corrected results usable and inter- 
pretable. 

Pearson ( 19 13 Inf.), has shown that a correc- 
tion based upon the correlation between the contin- 
uous variable, x, and its class means, x m (and simi- 
larly for the second variable) is useful in cor- 
recting variance, covariance, and correlation. 
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, as proven by Kelley (1923) is 

Correlation between a variate 
and the means of the classes [ 1 0 Z 91 J 
Into which It Is recorded 


[10:92] 

This formula provides a "perfect” correction for 
the variance. It, in fact begs the question for 
all it asserts is that if one knows the true 
variance the correction to the class means vari- 
ance is such as to produce the true variance. 
Also, as proven by Kelley (1923), 


This correlatior 

O' 


n, 




Thus 


o'^X = 


m 2 


r r 

xx y y 


r (wr~ Coarseness of 

x m y m x y grouping co r- n n . Q ^ ~j 
— 1 rectlon to r 


cr cr 
x y 
m 3 r 


on account of 
c I ass means 


If ay = ay = 1, as when a unit normal distri- 
bution is assumed, - [10:93] becomes 




x m * m jv v x v 

m y m 


[10:94] 


The corrected covariance is, 




c ' Cx m y m : 


cr cr 
x y 
m m 


Coarseness of grouping 
correction to covarla- [10:95] 
nee on account of class 
means 


Applying these corrections to the data of 
Table X H, we obtain: 

C 'V = L0; c'^x = L °: c' c x y =- 80654 ; c ,r x =,80654 

J m m nr m mm 

Applying these corrections to the data of 
Table X I we obtain: 


c' 1 ', = 1-0; *■>', = 1-0: .«>, ,=-83332; 

m m m m 


c "X y 
nr m 


=.83332 
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Thus again we find correction for coarseness of 
grouping quite adequate in case we have seven 
classes, but not in case of three classes. 

We conclude that whether we employ class in- 
dexes or class means we have no clearly estab- 
lished procedure for corrections to variance , 
covariance , correlations , and regressions , for 
coarseness of grouping if the number of classes 
is 3, 4, or 5. However, even with 3, 4, or 5 
classes the correction formulas just given yield 
statistics which are closer to the true values 
than are the raw statistics. 

Let us now classify the situations in which 
corrections for coarseness of grouping may be 
employed. If the problem is one of actual pre- 
diction and the predictor is only available, and 
in the future will be only available, in the 
coarse grouping no correction for coarseness of 
grouping in it should be made. If the predictand 
is only available in the coarse grouping and no 
utility attaches to a finer grouping, no correc- 
tion for coarseness of grouping should be made 
in this variable. If the predictand, though only 
available in the coarse grouping, is the phenom- 
enological expression of a noumenal continuum, 
knowledge of which would be of greater value 
than of the actual phenomena themselves, then a 
correction of the predictand for coarseness of 
grouping may be desirable. It is desirable if 
the relationship is substantial and the sample 
large so that no test of significance is needed, 
but i t i s not desirable if some sufficiently moot 
null hypothesis is postulated that a precise 
test is important. Finally, if both predictand 
and predictor are limited expressions of contin- 
uous variables, knowledge of the relationship 
between which answers an important theoretical 
issue, then corrections for coarseness of grouping 
in both variables is advisable, provided a need 
for a null hypothesis test is not primal. 
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Further corrections, especially for attenua- 
tion, for range, and for nonlinearity are con- 
sidered in later chapters. 

section it. CORRELATIONS AND OTHER STATISTICS 
OF SUMS AND DIFFERENCES 


If 

X a - + w 2 w 2 + ... + w k X k [10:96a] 

A weighted sum 

we may deal with deviations from means and write 


x a ” W 1 x i + W 2 X 2 + • • • + w k x k 

k 

letting S stand for a summation of k terms as i 

k ( k-1) 

varies from 1 to ic and letting S stand for 

a summation of k(k~l ) terms (i.e. , the number of 
permutations of k things two at a time), as i and 
j vary from 1 to t under the restriction that 
i tj it is readily derived that 

v a = S wf V, + S lk_1,w t w j cr i <T j r i j [10:97] 

For a second weighted sum we use distinctive 
subscripts as follows: 


X J3 = W a*a + W b X b + • 


+ w t X t 


[10:96b] 


The covariance between X a and is 

k t 


^ = S »,w ]W ,„ {' ; 


[10:98] 

The correlation between these weighted sums is: 


r a/S 




Correlation between a _ _ 

first and a second weighted L 1 0 I 9 9 J 
sum 


If all the measures which combined yield X a 
are similar, as for example, would be the case 
if they are similar forms of some test, and if 
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the same holds for the measures which are combined 
to yield we have approximately 


a i - a 2 ~ 


T — r 

1 1 2 13 


k - 1 , k 


ex - a. 
a b 


r ac * * * 


t-i ,t 


We use r,j to designate the average of the k(k- 1)/2 
correlations r 12 , , etc., We use r gh to desig- 

nate the average of the t(t- l)/2 correlations 
r 3 b > r ac » etc., and we use r ig to designate the 
average of the kt correlations r la , r lb , ... r kt . 
We immediately obtain, 

Correlation between sums, 

each consisting of equally 

kt r. weighted measures 

r a = , = i [10:100] 


/k+k(k-l)r } ./t + t(t-l)r gh 

As an approximation to [10:100] we may write 

[ 10 : 101 ] 

Approximate correlation be- 
tween sums when the measures 


kt r. 


/k+kCk-l JrT 9 / t+t( t-\ )r 


entering each are similar, 
equal ly weighted, [1 fl. 1 A1 1 

~ and equa|J- iU * iU1J 

“1 )r K 1 y varia- 
ab ble 


Precise equality of standard deviations may be 
brought about by normalizing measures of a series, 
or by ranking them, for then each variable will 
have rank scores from 1 to N , 

Let us consider the case in which x^ consists 
of a single measure which we will now call x Q 


and in which * x 


+ ... + x. , each of 


2 k » 

x 19 x 2 , ... x k being a series of ranks from 1 to 
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N. Then we _can readily compute F a from the data 
and obtain p ] j from [10:97] which in this case 
may be written 


_ A' 2 - 1 _ 

V a = V x [k+h(k-l)p,.} =-—[ k+k (k-l)p.^ [10:102] 

Yielding the average rank intercorrelation 
from the variance of a sum of ranks 


Further 


r 


0 a 


* O-l ^0 I 


Cr 


a 



k P 0i 
( 9 


[10:103] 


Yielding the average correlation of the separate parts of 
a sum (of measures which are ranksJ with the criterion 
from the correlation of the sum with the criterion 


Working with raw ranks 

V - 1 * l r*f*-U V 

a ‘ N 2 


we have 


The var i ance of N 
measures, each 

being the sum of [10:104] 
k rank scores 


Combining [10:102] and [10:104] we obtain 
2k (2N+1) 12 2 X 2 

P ,, = + [10:105] 

(k-l)(N-l) k(k-l)N{N 2 -l) 

Average intercorrelation between k series of N ranks each 

The variance error of p. . is 
~ 12 

V(p. ) = [ ]2 v (V) [10:105a] 

(N 2 -I)k(k-1) 
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The following problem illustrates the use of 
this formula. Six judges, K, T, U, B, L, H rank 
according to merit twelve answers to a given 
problem as follows: 


TABLE X J 
RANKS GIVEN BY JUDGES 


ANSWERS 

K 

T 


8 

L 

H 

X 

a 

X 2 

a 

A 




to 


5 

30 

900 

e 




6 


9 

30.5 

930.25 

c 






2 

13.5 

182.25 

D 

! * 

2 

2 

li 

8 

3 

30 

900 

E 

5 

12 

3 

1 


10 

35 

1,225 

F 

6 

1 

8 

2 

5 

i 

23 

529 

G 

7 

1 1 

10 

8 

12 

4- 

52 

2, 704 

H 

8 

9 

5 

7 

6 

II ' 

46 

2, 116 

1 

9 


9 

12 

7 

6 

47 

2,209 

J 

10 

7 

1 1 

5 

9 

8 

50 

2,500 

K 

1 1 

10 

12 

9 

10 

12 

64 

4,096 

L 

12 

8 

6 

3 

1 i 

7 

47 

2,209 

20,500 .50 


k = 6, N = 12, 2 X\ = 20,500.50 

therefore, by [10:104], p , ■ = .3241. 

We note that if A and t each approach infinity 
r_ is simply a coefficient corrected for attenu- 
ation and [10:100] becomes [11:25]. 

If the weighted sum [10:96a] is simply X 1 ~X 2 , 
which we may call X 6 , thus 

Then [10:97] reduces to 

v d =v i +v 2~ 2c i2 = v i + v 2~ 2a i a 2 r i 2 [10:106] 
The variance of a difference 

which is a formula of wide utility. 







CHAPTER XI 


FURTHER CORRELATION ISSUES 

SECTION l. THE VARIOUS CONSEQUENCES OF EMPLOYING 
SEMI-RELIABLE INITIAL MEASURES 

Classic sampling theory has looked upon the N 
scores, X 19 in a sample as randomly selected from 
a parent population and has been concerned with 
the problem of estimating the characteristics of 
the population from the information revealed by 
the sample. The observed record or score of a 
single individual, or case, a, is considered to 
be ah utterly trustworthy record. The concept 
of error attaches to X la only when X la is taken 
as an estimate of a population statistic, say 
of M 1 . Also a square difference ('X la ~X lb J 2 i s 
exactly the squared difference of the difference 
between the scores of individuals a and b and as 
such has no error. Only when this squared dif- 
ference is taken as evidence of the population 
variance of differences does the concept of er- 
ror attach to it. 

The concept of reliability of original meas- 
ures is utterly other than this. The number of 
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kernels on an ear of corn from a certain stalk 
is not a perfect measure of the fruitfulness per 
ear of the stalk, if it has a second ear, be- 
cause the number of kernels of this second ear 
may differ,, The price of a wheat product at a 
certain date, relative to the price at a basic 
date, is not a perfect wheat index, for a dif- 
ferent wheat product may yield a different index. 
The score of a pupil on a reading test is not a 
perfect measure of his reading ability for a 
second reading test given under similar condi- 
tions may reveal a different relative standing. 
The concept "reliability, n or "reliability of 
measurement, n concerns the phenomena of differ- 
ences in the measurements obtained when the in- 
dividual is doubly or multiply measured by means 
of instruments believed and intended to tap the 
same function. This concept has proven very 
jj f useful in psychological and educational fields, 

where it has been widely employed, and we may 
anticipate a similar utility if widely used in 
other social fields and in the biological and 
physical sciences. In the physical sciences the 
concept of unreliability of original observations 
has a long and honorable history, being subsumed 
under the topic, "errors of observation," but 
the correlational consequences of these errors 
have not been pursued in this field to the extent 
that they have been in the field of mental meas- 
urement. 

The concept only arises when two or more sup- 
posedly similar measures of the same thing are 
found to differ. Consider the case of an econo- 
mist who has selected 40 products to incorpo- 
rate into a cost of living index. These 40 
items have been conceived by him as in some es- 
sential sense being expressions of a single phe- 
nomenon, — cost of living. His judgment has been 
employed in their selection and weighting, if 
differentially weighted. In combining the items 
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into a single final score, he has either thought 
that the record of each item was an expression 
of this common function-plus-chance, or of this 
common f unction- pi us -a- non -chance -unique -f unc- 
tion-plus-chance . Thus for the recorded score 
on item i of the index we have, 

x | =x c + x u + e T , in which 

x c = the cost of living function 

x u = a non-chance, non-cost-of-living, func- 
tion unique to item i, that is, x u is 
not found (except as a matter of chance) 
in any of the other 39 items of the index, 

e. = an unaccountable, chance, or error in- 
fluence that has affected x ? . It will 
correlate with no other function (except 
as a matter of chance). 

The unique function, being unique, has the same 
correlational properties as the chance factor, 
so we combine them and designate the combination 
In brief, x c is a real, or true, measure of 
the cost of living of which x, is a fallible 
measure, and we write 

x.- c.x + e . 

I I CO I 

the Cj being a constant depending upon the par- 
ticular units of measurement which have been em- 
ployed. For a second item we have 

x. = c.x + e . 

J J « J 

and similarly for further items. Any linear 
weighted combination of these 40 items can be 
written 

^ A fallible measure as a 

X 1 = X co e i function of a true meas- 

ure plus a factor oper- 
ating like a chance fac- 
tor 


[ 11 : 01 ] 
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For some purposes we may find it convenient to 
write [11:01] thus 

x i = c i x o, + e i [11:02] 

which differs from [11:01] only in the units of 
measurement and not In the inherent relationship 
portrayed. 

The expression for x x asserts that the econo- 
mist conceives of his index as measuring nothing 
but cost of living plus irrelevant or chance 
factors. This attitude toward the total score, 
x x , extends to the separate items, x t , which have 
entered into it. He can therefore split the to- 
tal score into halves. It is customary to repre- 
sent the half scores by x x and x x , but to sim- 

“ . TT 

plify subscript notation we will here call the 
half scores x 3 and x 5 . 


x, 2 x = ,5x + e, 

A. j co 5 

2 

x X S x 5 = + e 5 

II 


Scores on the 
halves of X x 


[11:03] 


In these equations the error factors are uncor- 
related, being either strictly chance factors or 
having unique elements which behave like chance 
factors. Of course x is uncorrelated with 

CO 

either chance factor. For the total measure we 
have 


x 1 = x. + x, - x + e, [11:01a] 

1 3 5 co l 

If an x 2 measure is involved, we shall designate 
it 


x 6 = 


*7 + 


[11:01b] 



[11:06] SEMI-RELIABLE MEASURES 403 

Also we shall designate a criterion measure 


x o = + e o [11:04] 

Let us estimate as fully as possible the sta- 
tistics of x^ from a knowledge of x ± , x^ , and. x. , 
The variance of the differences between the half 
scores is 


vt = V 3 + 1^5 -2o 3s =K. 3 + V.' 

= W. s = % = ' , i.„ [11:05] 

which is a function of the errors of measurement 
only. This gives us Rulon’ s (1939) formula for 
the standard error of measurement: 

Standard error of estimate 

0\ “ CT „ when non-reg res sed X, Is [11:06] 

-j-.ojXqXp 

^ ' taken as evidence of X 

Cl) 

The notation o r lm0) in this connection requires 
explanation. If one investigates the scatter 
diagram whose dimensions are x^ and x 1 , he can 
readily prove that the standard deviation of 
arrays of x ± for fixed values of x^, the standard 
notation for which is cr laCy , is exactly cr Q ^ , 

[11:05] and [11:06]. Accordingly, cr lmC0 may 
correctly be used to indicate the standard error 
when x 1 is estimated from a knowledge of x^ (a 
process actually never performed), or when x^ is 
estimated by using non-regressed x 1 . 

The customary way to split a measure composed 
of many items into halves is to call the odd- 
numbered items one half and the even-numbered 
items the other half. This is frequently entirely 
satisfactory, but thought should be given to the 
matter, for if a different splitting yields 
halves which, in the judgment of the experimenter 
are more nearly comparable, then this different 
splitting should be employed. Surely in a cost- 
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of-living index all the semi-luxury items should 
not be assigned to the same half score, just as 
in a mental test all the hard items should not 
be concentrated in a single half* A priori con- 
siderations may suggest that some items have 
larger relative chance factors than others, and 
if so they should be judiciously distributed be- 
tween the halves, etc* The writer (1942) would 
emphasize that there should be full utilization 
of and dependence upon judgment in splitting a 
measure into halves , just as there has been such 
dependence upon it when drawing up the instru- 
ment in the first instance . It is in fact de- 
sirable that the process of making the instru- 
ment be one of making comparable halves* 

The suggestion has been made that a measure 
of 2 1 items can be split into halves in C\* ways 
and that the best procedure would be to try out 
all these ways (when computing reliability coef- 
ficients) and find the average result. Theore- 
tically this would seem to the writer sound only 
in the rare instance in which the devisor con- 
sidered all items (a) equally excellent, (b) 
equally subject to chance, and (c) all functioning 
at the same level. Kuder and Richardson ( 19 37, 
1939) have devised formulas for the computation 
of reliability coefficients when these condi- 
tions hold. Their more elaborate and precise 
formulas scarcely constitute an economy of labor 
over the formulas for the reliability coeffi- 
cient given in this section, though it is true 
that they void the perplexing question of how to 
split a test into halves. Their simplest formula 
for a test (K-R write in terms of mental test 
problems) the items of which are scored "right, " 
or "wrong, "is 


r 


l 


-1. Zi zlEJ. 

t-l ( V 


Kude r-Rl chardson short formula for the reliability 


. [11:07] 

coeff tci ent 


[ 11 : 08 ] 
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in which t is the number of items in the test, 
p - - the mean item success score and q - l*~p. 

When the special conditions (a) , (b) , and (c) 
maintain, this reliability coefficient is the 
same as that given by [11:10], and it, rather 
surprisingly, approximates this answer for quite 
a range of common situations. 

The standard error of estimate, given by 
Rulon* s formula, is the statistic needed in order 
to attach the proper degree of confidence to an 
original measure. The numerical steps are 
straight-forward and also contribute to further 
statistics which are commonly needed. Simply 
record in four columns the values of x^ f x^ , 
( Xj ) , and x x (-x^+x^), and compute the vari- 
ance of each column. In addition to the essential 
variance error of estimate, V(x ^ ) , we can now 
readily obtain C. S. Spearman's reliability coef- 
ficient, which is the correlation between simi- 
lar measures . Referring to the sum and differ- 
ence formula for correlation we have 


r J, r 3 5 


V 1 - y ( x ~x f) ) 
4 cr 3 cr 5 


The ha l f-measure 
reliability via [11*08] 
differences L ’ “ * ^ J 

between half scores 


We desire r 1 , or as frequently designated , 
the correlation between x ± and a postulated simi- 
lar measure x^. We shall obtain this by first 
deriving the Spearman-Brown step-up formula 
giving the reliability of a measure n times as 
long as the measure whose reliability has been 
experimentally determined. Usually one has com- 
puted r- 5 and desires r 19 in which case n- 2, for 
x x is tne sum of the two similar measures x^ and 
x 5 , the reliability of which is the computed 
value r 35 * Let 




+ x. 


+ x 0 


n +1 
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be a measure which is the sum of n similar parts, 
and let x„ be a measure similar to x n : 


- X T1I + X V + *•' + X 2N+I 


All parts of x n and of x N are similar. 

= ”V 3 + 


C n N~ n2c 3 5 


n 2 c 


3 5 


n r 


**nN „T7 


35 


[11:09] 


n V 3 +(n 2 -n)c 35 l<n-l )r 3} 

Spearman-Brown step~up formula 

Solving [11:09] for 5 we obtain the step down 


formula [11:09a] 
r 


3! <■„ + nCl-r„.) 

Solving [11:09] for n we obtain [11:09b] 
d-r 35 ; 


[11:09a] 




[11:09b] 


giving n when is known and some reliability 
r n i s desired. The variance error of n thus 
derived is 

.2 


n 


v = - 


«Vi+r 35 ; 2 

V- =? — — [11:09c] 


rU"-2) 


3 5 


For the step-up from the half measure reliability, 
[11:09] becomes, 


[11:14] 
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r x = 


2r x 
~i _ 


2 r 


35 


l+r x 1+r 
T 


35 


[ 11 : 10 ] 


The variance error of r n is consequent to the 

error in ,, and by the method of Chapter XIII , 

Section 8, we obtain Shen’s (1924) formula for 

the standard error of r : 

n 


n cr 


cr, - < 


[l+Cn-lJ)r 1 ] : 


The standard error of 
a stepped-up reliabil- 
ity coefficient 


[ 11 : 11 ] 


The standard error of r 1 is given by [10:46]: 
Utilizing this together with [11:10], we find 
that when the step-up is from the half score to 
the total score, then [11 : 11] becomes 


2(l-r 1 ) 

°” ri SlF2 


Standard error of JT\ here 
derived via half scores, f 1 1 * 121 
l He re N Is number of cases- -1 * J 
in the samp I e. ) 


Utilizing relationships [11:05], [11:10], and 

[11:13], 

V x = V 3 + V 5 + 2c 35 = 2V 3 (1 +r 35 ) [11:13] 

Variance of the sum of ; two slrpilar measures 

we obtain 



VJ1 -r x ) 


Variance error of estimate as 
a function of the variance M 1-1A1 
and reliability coefficient ^ ^ 


Illustrations from the field of mental meas- 
urement of functions involving the reliabili ty 
coefficient: A score on a psychological test is 
fallible and may be set equal to a true ability 
plus an error, as indicated in [11:01]. We 
clearly desire as much info rmation as possible 
about the true ability, x^, having the observed 
test score, x 1# Dealing with gross scores, 
rather than deviation measures, we write, 
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v _ Observed ability = true ability 

*1 ~ + e i + an error [11:01] 

the e^s being such that their mean for a large 
sample tends towards zero. This must be so for 
otherwise there is something in the e 1 ’s which 
is not chance and this would then be a part of 
X^. Accordingly the mean of the s is equal 
to the mean of the X^’s within limits represented 
by the variance error M 1 for given values of X , 
namely, V e /N . This variance error is only 1 Jn 
as large as that attaching to the individual 
score so we may, unless N is very small, inter- 
pret individual scores on the basis that 

The mean of true ability 

^ oj ~ the mean of observed ability [11 I 15] 


Dealing with deviation scores, we can obtain 
the variance of both members of [11:01], utilize 
[11:14] and obtain 

V — i r An estimate of the variance of r-i-i -« 

co *1*1 true ability [11 : 1 OJ 


and 


a 


CO 




An estimate of the standard 
deviation of true ability 


[11:16a] 


This and further relationships given involving 
"true ability" are strictly true in the popula- 
tion and approximately so in the sample. We 
note that the observed variability in a group 
tends to be greater than the true variability. 
The difference with instruments of low reliabil- 
ity may be so great as materially to change the 
picture of events. For a certain reading test, 
with reliability of about .36, an investigator 
found that some 47 per cent of fi fth-grade pupils 
fell below the fourth-grade mean or above the 
sixth-grade mean, thus indicating serious mis- 
grading or mi sclassifi cation. Utilizing [11: 16] 
and assuming a normal distribution, it is simple 
to compute the percentage which, in true ability, 
lies outside these same limits. The answer is 23, 
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only half the percentage given by the raw scores. 
We can estimate an individual’s X^ from his 

CO 

*i- i i 

c. 2x.x - — 2fx +e, )x = V [11:17] 

±0) jy 1 CO m 1 y 60 CO 


Obtaining a ^ from [11:16a] we find that 

_ 7— Correlation between a true and a 

£ i . “ / r, fallible measure of the same [11 1 18 J 

function 


X 


co 


and 



- r. 


[11:19] 


X 


CO 


r 1 X 1 + (1-r^ M 1 


Regression of true 

ability upon a fa l~ r 1 1 . on *1 

llble measure of it L J-J- * J 


This is an interesting equation in that it ex- 
presses the estimate of true ability as a weighted 
sum of two separate estimates, — one based upon 
the individual's observed score, X ± , and the 
other based upon the mean of the group to which 
he belongs, if 1 . If the test is highly reliable, 
much weight is given to the test score and little 
to the group mean, and vice versa. Suppose 
fourth-grade pupil A and fifth-grade pupil B 
each score 45 on a test having a reliability of 
. 80 in each grade, and that the means and standard 
deviations for the grades are: ^-40; cr 4 "10; 

50; and o- 5 =10. For pupil A we estimate his 
true ability thus: 

Xoo = .80 (45) + .20 (40) = 44 
For pupil B 

= .80 (45) + .20 (50) = 46 

This difference in outcome is certainly sound. 
We know two things about pupil A, the first 
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fact (X 1 =45) suggests a true ability of 45, and 
the second fact (member of group whose mean =40) 
suggests a true ability of 40. The best composite 
of his ability is 44, as given by [11:20]. 
Suppose for the single hour when tested pupil A 
had sat with the fifth grade, would we now use 
the fifth-grade mean and estimate his true ability 
as 46? Certainly not, for pupil A is still a 
fourth-grader. This group membership is not a 
whim, but a thing as definitely attached to pupil 
A as is his score 45. 

The variance of the estimated true scores for 
the entire group is clearly 


VL 

CO 



Variance of estimated 
true scores (see [ll:22j) 


[ 11 : 21 ] 


which we note is less than V^. The relationship 
>V~ is important. 

e variance error of an estimated true score 
is given by the usual formula [10:09], which for 
this situation becomes 


V,>V 


V 

00 . 


1 


V r i-r l) 


Variance error of estimate 
of a true score, X^ from X^ 


[ 11 : 22 ] 


This is less than the variance error of estimate 
given by [11:14]. The difference in the situa- 
tions must be explicitly noted. When x ± is taken 

as evidence of x . the variance error of esti- 
00 9 

mate is given by [11:14], but when x 1 is re- 
gressed so that r 1 x 1 is taken as evidence of x a , 
an improved estimate is gotten and the variance 
error of estimate is now given by [11:22]. If 
the mean and reliability for the group to which 
the tested person naturally belongs are known, it 
is always preferable to use the regressed score 
as the estimate of true ability. Since this best 
practice is infrequent practice , the pertinent 
variance error of estimate is usually that given 
by [11:14]. 

Formula [11 : 21] throws interesting light upon 
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the classification of individuals by fallible 
measures. Suppose upon a scholastic test we have 
fourth, fifth, and sixth-grade means of 40, 50, 
and 60, and that a ^ for the fifth grade is 10. 
Assume a normal distribution of ability and a 
rule which demotes fifth-graders who score below 
40 and promotes fifth-graders scoring above 60. 
If the reliability of the test is 1.00, we obtain 
the correct result of 16 per cent below 40 and 
16 per cent above 60, and thus we reclassify 32 
per cent of the pupils. If the test has a reli- 
ability of .50, we find with the aid of [11:16a] 
that cr 1 - 14.14. Employing raw scores, reference 
to a normal probability table informs us that we 
would now reclassify 48 per cent, which is an 
excessive number. If, however, we use regressed 
scores, x^( =r 1 x 1 ) , we have a distribution whose 
standard deviation is 7.07 (from [11:21]) and 
reclassify 16 per cent, which is a conservative 
number. It can also be shown, using volumes of a 
normal bivariate surface as tabled in Pearson 
(1931), that of the 48 per cent reclassified upon 
the basis of raw scores, 26/48 did not in truth 
fall beyond the limits set, and that of the 16 per 
cent reclassified upon the basis of regressed 
scores, 6/16 did not in truth fall beyond these 
limits. In short, the use of a fallible measure 
at its face value in connection with promotions, 
classification, etc., will lead to or create many 
misplacements, while the use of this same fallible 
measure properly regressed will create few mis- 
placements. If we will but regress scores and 
compute standard errors of estimated true scores, 
we need not hesitate to use an instrument of low 
reliability. 

The reliability of measures and issues in- 
volving two variables . Let be a true cri- 
terion, x Q the fallible criterion of reliability 
r 0 , r ^ a true predictor, and x x the fallible pre- 
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dictor of reliability r 1 . We have the covariance 

1 


'0 1 


SC** + e 0 )( X a> + e i ) 

+ 2x«o e i +Sf ffl e 0 +2e 0 e x ; 


These last three summations are chance deviations 
from zero. We can set each - 0, an unbiased es- 
timate of zero. We accordingly reach the follow- 
ing unbiased approximation: 


C 0 1 C 00l 


■'O co 


00 Qf) 


so that 


"0 1 


0" A C-t r n ■* r A , Correlation cor- 

• 1 1 - U X reeled for 


oo 1 


attenuation In 

r i cr 0 yr 0 a 1 /?7 one variable 


[11:23] 


[11:24] 


By a similar derivation, 


■ o 1 


' ®G) 




C. S. Spearman f s formula 

for correction for attenu- [11:25] 

ation in both variables 


If reliabilities are computed by finding the 
correlation between comparable half scores and 
stepping up, [11:10]., the variance error of r M1 
is given by formula [13:90], and the variance 
error of r ffla) by [13:88]. 

For the estimate of a true score we have, as 
given by [11:19] , 


X co r ool' 


’ 0 1 


x 1 [11:26] 


Since the estimate of x Q from x x is 


x o = r o l • 


[11:27] 
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we observe that we reach the same answer in the 
two cases. However, this common answer is a more 
accurate estimate of the unknown true measure x^ 
than of the fallible measure x 0 . 


a 


o 


l 



as given by [10:49], and 

°~CD . 1 “ °*00 /T^T ~ ^0 ^7 • * • • [11*27] 

Standard error of estimate of a tr’ue criterion , from X^ 

This is encouraging for it entitles us to be- 
lieve, even if we do not know r Q , but do know 
that the criterion is fallible, that our actual 
estimates of the criterion are more trustworthy 

than indicated by a Q x . As illustration, we 

can cite the many studies reporting correlations 
with teachers' marks. In view of the fact that 
the reliability of teachers' marks seldom exceeds 
.75, that their average is in the neighborhood 
of .50 (lower for character ratings), and not in- 
frequently are no higher than .30, we are entitled 
to attach considerably more significance to cor- 
relations with teachers' marks than has usually 
been done. 

The significance of differences between falli- 
ble measures . The raw scores X la and X 2 a of 
individual a upon two achievement measures are 
ordinarily in such units that the raw difference 
X ia -X 2 a is not interpretable. If individual a 
belongs to a normally competitive group, a useful 
procedure is to express scores as standard scores 
using the mean and standard devi ation of this 
group. Thus, 

Z 1 a ~ a~“^I 

and 
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Z 2a " ^2 a ^/°"2 

Then z x a '“ z 2 a ex P res ses the superiority of indi- 
vidual a in trait 1 to trait 2 relative to the 
group standard, A difference measure of this 
sort is doubly contaminated with error, — that in 
z la and that in z 2a . Counselors unfamiliar with 
this generally large error have foolishly trusted 
observed differences while other unduly conser- 
vative counselors have been prone to distrust all 
observed differences and place their trust in 
the general-mental-ability doctrine which con- 
siders the level of ability of a person to be the 
same in all intellectual functions. The precise 
handling of such within- the-indi vidual differ- 
ences is not difficult, but as one would expect, 
the standard error of the difference z la ~~z 2a 
not infrequently very large. If this same indi- 
vidual were tested with comparable forms of the 
two measures, his observed difference would be 
z Ia~ z IIa • Th e correlation between such measures 
of difference for the group entire is the relia- 
bility of this difference in standard score 
measures. We may designate such a difference as 
z ia~ z 2a w ith the symbol d 12m0)y > an ^ read it "the 
difference between z ± and z 2 for fixed values x^ 
and n or 1f the difference between z 1 and z 2 
which is independent of and x^. " This is 

obviously so, for when the individual is the 
same, his true abilities x^ and remain the 
same while the individual is being measured by 
X ± or X 2 . 

d 12 = z x -z 2 [11:28] 

V(d 12 ) = 2~2r 12 [11:28a] 

ii - *T+r 2 ~2r 12 


[11:29] 
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r(di 2 dj jj ) 


f l +r 2 ~ 2f 12 

2~2r 12 


[11:29a] 


Reliability of differences between the standard scores 
of an individual 


of, := z, — Z- — e_ — 
12, coy 1-0)7 2 . o>7 1 2 


. [11:30] 


T// , , \ ~o Variance error of n t 

V ( d 1 2 # ^y) ^ r j “* /* g individual standard LUi OU a J 

sco re d i f f erence 

Formula [11:29a] shows that only in case r 12 
is zero or negative does the reliability of the 
within-the-individual difference compare favora- 
bly with that of the reliability of the measures 
employed in obtaining the difference. 

V(d i 2 ) can be analyzed into non-chance and 
chance independent parts, thus: 


''(d, 1 >=iV(d 11 )-V(d 12 , ov )l + V( d l2 ^ y ) 


= r(d^l*V(d 12 _^ 

in which 

. [11:31] 

z z 

rl - M y - 7 -7. 

ay a y 

2 1 Z 2 

. [11:32] 

V ( d uy) =r i +r 2" 2r l2 (See [11:29]) 

[11:32a] 

Accordingly [11:31] may be written 


2~2r lr = (r 1 +r 2 -2r lz ) +(2~r 1 ~r 2 ) 

[11:31a] 

Total V- real V + chance V 



A precise variance ratio test involving V(d Qiy ) 
and V(d 12 would be valuable, but is lacking 
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because the restrictions that have been placed 
upon the vari ables have been nonlinear in charac- 
ter. 

Two other measures of difference are important. 
They are d 7 , and d~~. 

1 co/co,y/y coy 


d' 

CO 


A , y/y 






[11:33] 


a non- individually observed difference, but one 
whose variance for the group can be computed. 
Its variance is 


w / y/ y 


) 


2~2r 


coy 


A group measure 
of idiosyncrasy [H : 33 a ] 


This measure of the extent to which the members 
of the group in their entirety are different 
(within themselves) in the two abilities would 
be of importance in a study of several groups 
which have had different types of tutelage. Fur- 
ther, if the variance error of r wy as given by 
[13:88] is such that there is negligible proba- 
bility that this variance is due to chance, it 
may then be appropriate to secure individual 
measures of difference, [11:28], or [11:34]. 


dr~ 

coy 



Regressed measure 
with? n-th e-t nd ? v ? ~ 
dual difference 
(see [ll:37] ) 


of 

[11:34] 


Writing d — thus: 

c toy 

d ly = r X Z l~ r 2 Z 2 = ( r i Z co- r 2 Z y) + ( r i e i~ r 2 e 2) 

we obtain an analysis of the variance into non- 
chance and chance portions. 
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r i +r 2 - ^ r l r 2 r l2 = ( r l +r l~ 2r l r 2 r l2) 


+(rl-r{+r \-r\) [11; 34b j 

Of course the restrictions which have been im- 
posed are nonlinear, so a vari ance ratio test 
would not be precise, but ordinarily we can se- 
cure serviceable information by dealing with 
critical ratios. The individual d^-/cr(d^- ^ y ) 
is a critical ratio. The mean square of such 
may be compared with the mean square of d 12 /a 
(d 12 . a>y) i n order to ascertain which measure, 
cfc“ of d 12 , i s the more efficient. 


V(d 12 -) 2“2r 12 

2-r x -r a 


[11:35] 


V(d~) _ r I +r 2~2r 1 r 2 r 

= r i -r i +r 2 “ r 2 


[11:36] 


Substitution of numerical values in these equa- 
tions shows that in case reliabilities are 
greater than . 50 the regressed, or measure 
of difference is more trustworthy than the non- 
regressed, or d 12 , measure. 

The proportion of cases whose idiosyncrasies 
( wi thin individual differences) are revealed by 
fallible measures. We will consider thi s prob- 
lem when the observed measure of difference is 
d 12 and second when d~^y. Let us assume that 
all distributions of differences are normal . 
The ratio a (d ±2 ^^)/cr(d^ 2 ) is the ratio of the 
standard deviation of differences due to chance 
to the standard deviation of the observed dif- 
ferences. As thi s ratio becomes small an in- 
creasing number of the observed differences are 
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not artifacts. 

Chart XI I illustrates the situation, 
dash-curve is the distribution of differen 

CHART XI I 



attributable to chance and the full-line curve is 
that of the observed differences, so that the 
shaded portion is the proportion of differences 
in excess of the chance proportion. Table XI A, 
herewith, (see Kelley, 1923), gives this pro- 

TABLE XI A 


NON-CHANCE DIFFERENCES FOR DIFFERENT RATIOS 


Cr DUE TO CHANCE 

PROPORTION OF 

Cr DUE TO CHANCE 

PROPORTION OF 

CT OBSERVED 

DIFFERENCES IN 
EXCESS OF THE 
CHANCE PROPOR- 
TION 

CT OBSERVED 

DIFFERENCES IN 
EXCESS OF THE 
CHANCE PROPOR- 
TION 

.02 

. 950 

.50 

.323 

. 05 

. 888 

.55 

.28 1 

. 10 

. 798 

.60 

. 242 

. 15 

.71 9 

.65 

.205 

.20 

.647 

.70 

. 171 

.25 

.582 

.75 

. 138 

.30 

.522 

.80 

. 1 08 

.35 

.467 

.85 

.078 

.40 

.415 

.90 

.051 

. 1*5 

.367 

.95 

. 025 



.99 

.005 


portion for different values of the ratio. Chart 





[11:37] 


SEMI-RELIABLE MEASURES 


419 


XI I approximately represents the situation found 
in connection with Stanford Achievement Test 
measures of Arithmetic Computation and of Arith- 
metic Reasoning, for 96 eighth-grade pupils. 

Arithmetic Computation score = x x and has 
a reliability of .669. 

Arithmetic Reasoning score = x 2 and has 
a reliability of .825. 

o-(d 12 ^ y ) = = .711 

<r(d 12 ) " /2-2r 1 2 ”1.246 

Ratio = .571, equivalent to 26 per cent of 
cases having di fferences in excess of chance. 

°< d Zy. *y) = ffiT- - 517 

<r( d l~) = /r i +r 2“ 2r i r 2 r i 2 = -939 


Ratio - .551, equivalent to 27 per cent of 
cases having differences in excess of chance. 

For these same data the reliability of g/ 12 , 
as given by [11:29a], is .674, and the relia- 
bility of as given by [11:37] is .697. 




- ~^ r i r 2 f l 2 

r i +r 2 -2r i r 2 r i 2 


(See [11:34]) [11: 37] 


Reliability and other issues involving three or more 
unequal 1 y reliable measures of the same thing 

A typical situation would exist when several 
judges separately rate, or score, the same set 
of objects, be they people, cattle, art products, 
or what not. The rating by each judge is upon 
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some commonly defined scale like "impulsiveness 
of individuals, " "general excellence of beef 
stock," "beauty of art sample," etc. Since the 
impulsiveness of a person can scarcely be better 
defined than as equal to the average rating, or 
score, given by an adequate number of those com- 
petent to rate, it is appropriate and useful to 
assert that the judges are rating the same thing, 
though they may be unequally expert in doing it. 
Similarly for many other situations. We may 
thus write for the scores as given by the k 
judges 

x, =c,x +e,; x =c„x +e„s . . . x, = c,x +e, 

llcdl 7 2 2 co 2 T kkojk 

wherein the c’ s change from judge to judge and 
depend upon the respective scales used, wherein 
is the true score attaching to the object 
scores; and wherein the e * s are errors of judg- 
ment; uncorrelated with each other or with the 
true scores. 

When we have three judges it follows, as shown 
by Shen* (1925), that the reliability of a single 
judge, say judge number* 1 is 




r i - -A 2 1 ^ Triad formula for reliability [11:38] 


23 


The variance error of r ± thus determined is 


r 2 2 1 

y = — i~ [4r x + — + + (l-2r x ) 

"l 1 - _2 1 


N~ 2 


r i r 2 3 


4—2-) -5] 


r 2 r 2 
*12 r l3 


[11:39] 


Formula [1X139] herewith differs slightly from Sheri's formula 
in which there is a typographical error. 


[11:41] 
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If the number of measures of the same thing 
is some number k, greater than three, we first 
express each set of measures as standard scores, 

thu S Z ^ % 2 2 2 ^ ^ ^ 2* • • • 2* ^ 

(X k -M k )/c k . We let s-z x + z 2 +. . .+ z k , and let 
Sj. ~ s - z 1# Since the correlation between z x 
and s x corrected for attenuation is equal to 
1.00, we have the reliability of the first set 
of measures as derived by Shen [11.40] 

*-i Sl (k~2)7* u 

r= £ =— — [11:40] 

r , x * r tj“ 2r ii 

Reliability from the Inte rcor re I at I ons between unequally 
excellent measures of the same thing 

In this formula r M is the average of the 
k( k-1 )/ 2 intercorrelations between the A measures, 
and r ±l is the average of the (&- 1) correlations 
between z 1 andjthe (ft“L) remaining measures. 

Frequently r } . can be gotten by [10:105] and 
r 1? by [10:103] or by these formulas modified to 
be appropriate to standard score variables. 

If we have four measures of the same thing, 
but unequally reliable, it is simple to prove 
that the three following tetrads, two of which 
are independent, equal zero. 

*■1234 “ r i2 r 3^ 

*13*2 r i 3 r 24 

^1213 ~ r i 2 r 3 4 - 

The variance erroi 
(1928) is 


r i3 r 2* 


r Hf r 2 3 


r I 4 r 2 3 


= 0 


= 0 


= 0 


Value of te- 
trads when a 
single gen-r-, A -i 
era! f actor*- 1 ! . 41 J 
accounts for 
all Inter- 
correlations 


Of 


*■123 4 


as derived by Kelley 
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4^ 

X 

' N- 2 

■[^2 

-f-Jf ^ -fT ^ 

Tr l 3 Tr 2 

t +r 3 4 + 2 r i2 r i4 r 23 r 3. t 

+ 2 r 13 

r !4 

r 23 r 

2 4 - ^ r 

i 2 r i 3 r 2 3 

~ 2r i 2 r i 4 r 2 4 


- 2 r 13 

r !4 

r 3 u _ 

2r 23 r 

24 r 3'4 + t 

2 (r 2 +r 2 

1 23 4^ r l 2 F l; 

\ 

+ rl,+r 2 23 


4 + r 3 

4)] 


[11:42] 

Thus, 

in 

the 

case 

of four 

variables, 

we may 


say that they may be conceived as due to a gener- 
al factor plus four specific factors when 

r i2 r 34 “ r i3 r 24 _ r i4 r 23 

Kelley has shown that the pentad function 
following equals zero when five variables may be 
thought of as consequent to two general factors 
plus specific factors: 

■^123 45 ~ r l { if 2 4 r 3 5 r 4 5 +r m r 15 r 23 r 2 4 r 3 5 +r l 2 r l4 r 25 r 3‘+ r 3 5 
+r 13 r 15 r 24 r 25 r 34 +r 12 r l5 r 23 r 3 4 r 4 5 +r 13 r l4 r 23 r 25 r 45 
“ r 12 r l4 r 23 r 35 r 45 -r 13 r 15 r 23 r 24 r 4 5 -r l2 r 15 r 2 4 r 34 r 35 
“ r 13 r 14 r 24 r 25 r 35 -r l2 r 13 r 25 r 3 4 r 4 5 -r 14 r l5 r 23 r 25 r 3 l + 

[11:43] 

The computation of the variance error of the 
function is rather laborious. The relationships 
immediately preceding and still other general 
relationships connected with "factor analysis" 
and the minimal necessary number of independent 
variables underlying a given set of correlations 
can be well expressed in the notation of matrix 
algebra. 


[11:443 
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The weighting factor which is to be attached 
to a variable because of its unreliability . Let 
the standard scores z ± , z 2 , z^, . . . , each be 

measures of z (and of no other true measure) and 
have chance errors e ± , e 2 , e^, ... t whose 

variances are unequal. To allow for different 
units of measurement we write: 

z i ” a ± Z Q> + e i> and “ 3. “ a 1 V 0) + ( l““ r i) 

Z 2 ” a 2 Z a; +e 2» and \ 1 a 2^ + ‘^l” r 2^ 

Z 3 " a 3 Z co + e 3> and V 3 ” 1 ” a 3 F a> + ( etc * 

( as here defined being different from as 
defined by [11:16] and [11:01].) 

From the variances we note that 


v r . 


Vr , 


✓r. 


etc. 


l K A 2 ' * 3 

so that we may now redefine z^ and write the z* s 
thus: 

z x = + e x 


z 0 = i/ r ' z + e 0 

C l 00 C. 


Taking the variance of z 1 we have, l-r 1 F > + ( l“ r 1 ) , 
so V = 1 . 




z 


Vr, 


z 


2 


V r 2 


- z 


vT7 


etc. 

The left-hand members are unbiased estimates of 
z^ . By [13:19] the best combination of them 
will be that given by weighting them inversely 
as their variance errors. The variance error of 
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[11:44] 


(equal to that of e,//r^), as given by 
[11:14] is ( 1-r 1 ) /r , . The* inverse of this is 
the proper weighting factor of z 1 /Vr 1 , so that 
the best estimate of z, is: 

CO 




1 -r 


' z l + 


vr. 


1 - r , 


v ' r . 


Z 2 + 


1-r. 


Z 3 + 


[11:44] 


1-r, 


1-r, 


1-r. 


Otherwise expressed, the ratios of weighting fac- 
tors to allow for differences in reliabilities 
as otherwise derived by Kelley (1927, pp. 211-213) 
are 


v ' r 2 


/r~ 

3 


l-r x 1-r 4 1-r, 


etc. 


[11:45] 


Of course these weights do not overtly modify 
or supplant the weights in a multiple regression 
equation, but lacking regression equation weights 
these weighting factors may confidently be expec- 
ted to be beneficial. Table XI B herewith pro- 
vides these weights for different reliability 
coe fficients. 


TABLE XI B 

WEIGHTING FACTORS CONSEQUENT TO VALUES 
OF THE RELIABILITY COEFFICIENT 


r l 

✓ r 7 

r i 


r i 

/r x 

r i 

/r x 

l~ r i 


l -r i 

l- r x 

.01 

.101 

.26 

.689 

.51 

1.46 

.76 

3.63 

.02 

. 144 

.27 

.712 

.52 

1 .50 

.77 

3.82 

.03 

.179 

.28 

. 735 - 

.53 

1 . 55 - 

.78 

4.01 

.04 

.208 

.29 

.758 

.54 

1.60 

.79 

4.23 

.05 

. 235 + 

.30 

.782 

.55 

1 . 65 - 

.80 

4.47 
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TABLE XI B 

(CONTINUED) 



56 

1.70 

.81 

4.74 

57 

1.76 

.82 

5.03 

58 

1.81 

.83 

5.36 

59 

1.87 

.84 

5.73 

60 

1.94 

* 85 

6 . 1 5 - 

61 

2.00 

.86 

6.62 

62 

2.07 

.87 

7.17 

63 

2 . 15 “ 

.88 

7. 82 

64 

2.22 

.89 

8.58 

85 

2.30 

.90 

9.49 

66 

2.39 

.91 

10.60 

67 

2.48 

.92 

11.99 

68 

2.58 

.93 

13.78 

69 

2.68 

.94 

16 . 16 

70 

2.79 

.95 

19.49 

71 

2.91 

.96 

24.49 

72 

3.03 

.97 

32.83 


section 2. THE EFFECT OF VARIABILITY 
IN RANGE UPON CORRELATION 


The effect of differences in range of talent 
examined upon reliability coefficients. Consider 
a test with scores, say, from 0 to 50, yielding 
a mean of about 25 when given to the group to 
which it is best adapted. If given to a markedly 
inferior group the scores will tend to cluster 
around 0, and if to a markedly superior group 
they will tend to cluster around 50, and if given 
to a group so heterogeneous as to include in- 


fi 
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ferior, average, and superior, the reliability 
scatter diagram involving scores X ± and X 2 upon 
two similar forms of the test will tend to have 
lanceolate contour lines. This is evidence 
of malfunctioning of the instrument at both low 
and high levels . If the malfunctioning is at 
one of these levels only, pear-shaped contour 
lines will be found. In connection with reli- 
ability the first observation to make is of 
the reliability scatter diagram in order to as- 
certain the range of ef fectiveness of the instru- 
ment. The lanceolate contour lines indicate 
poor functioning of the instrument, but this is 
not revealed by the reliability coefficient, 
which in this case is high and accordingly very 
misleading. 

If the instrument functions equally well 
throughout the range of talent tested, then the 
variability of differences ( X ± -X x ) will tend to 
be the same at all levels. The variability of 
(X 1 -X 1 ) is a variability at right angles to the 
major axis of the contour ellipses in the (X 1 ,X J ) 
scatter diagram and can be estimated by inspec- 
tion of the scatter diagram, or it can be compu- 
ted for different levels, a subject's level 
being determined by (X 1 +X^)/2. For an instru- 
ment of uniform excellence we have 

V(x^x t ) =F 1 +V I -2or 1 o- I r 1J = 2V 1 (l~r 1 ) 
thus 

V^l-r^ = a constant [11:46] 

is a situation which maintains throughout the 
range of equal effectiveness of the instrument. 
This immediately provides us with a means of es- 
timating the reliability for one range of talent 
knowing it for a different range. We will use 
lower-case letters to indicate a narrower range 
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and capital letters a wider range. We 

Relationship between 
v. (1-r.) = V.Cl-RJ reliability 
1' 1 ' 1' !■' bi I ity in t 


i n st rumen t 


:he 

of 


case of 
uni form 


have: 

[11:46a] 

an 

merit 


Introducing the variability of true scores, this 
relationship becomes: 


v „ ' ^a-ro 


[11:47] 


We may note that v 1 (l ~r ± ) is the variance of 
( x i~ x a>) so that equality, for different ranges, 
of the variance of the differences (x 1 ^x 1 ) is 
equivalent to the equality, for different ranges, 
of the variance of the difference between ob- 
served scores and true scores (see Kelley, 1921). 

Table XI C, herewith, illustrates how closely 
relationship [11:46a] is maintained in the case 
of one instrument, designed to be effective from 
the second to the ninth grades.* 

TABLE XI C 

DATA UPON STANFORD SPELLING TEST, FORMS A AND B 


Number of cases approximately 165 per school grade 


SCHOOL 

GRADES 

MEANS 

VARI ANCES 

ACTUAL 

RE L 1 AB 1 i ITY 
COEFFICIENTS 

RELI ABILITY. 
FROM TOTAL 

USING {11:46a) 

2 

26.0 

339. 

.88 

.81 

3 

48.8 

390. 

.87 

. 83 

4 

71.6 

374. 

.87 

. 82 

5 

96.8 

594. 

.90 

.89 

6 

1 17.8 

569. 

.87 

. 88 

7 

139.4 

454. 

.82 

.86 

8 

160.4 

515. 

.76 

. 87 

9 

172.2 

465. 

.90 

. 86 

2 to 9 

I0M. 1 

2901. 

.9774 


inclusive 

1 





Kelley, Ruch, and Terman, STANFORD ACHIEVEMENT TEST MANUAL 
OF DIRECTIONS, 1925 and revised, 1927, Tables 2 and 4. 
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The differences between the actual and the 
estimated grade reliability coefficients are 
somewhat greater than may reasonably be attrib- 
uted to chance. However, lacking speci fic narrow 
range determinations, formula [11:46a] should be 
used to estimate the narrow-range reliability 
knowing the wide-range value. 

For any problem the reliability coefficient 
to be used is that which is appropriate to the 
normal competitive group implied in the problem. 
In school matters the normal competitive group is 
the school grade and certainly not the group 
given by combining grades 2 to 9. Even to re- 
port a reliability coefficient for a noncompeti- 
tive wide-range group is mi sinformati ve to all 
except the statistically expert. 

When an individual may be considered as either 
a member of a narrow or of a wide-range group we 
ask the question of whether the connection in 
which we study him makes any difference. We 
have two formula [11:20] estimates of his abil- 
ity: 

X^-r 1 X 1 + ('l-r 1 9m 1 , using narrow- range 

statistics; 

i using wide-range 
statistics. 

Formula [11:22] provides us with the variance 
errors of these estimates. Utilizing [11:47] we 
find that the ratio of these variance errors is 

v i( r i~ r l) _ r i 

~ *i < 1 

showing that more reliable results are always 
obtained by using the narrow-range group. In 
short, if the individual may be treated as a 
member of a number of groups, the most reliable 


torn 1 


[11:48] 
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results will be obtained by treating him as a 
member of the most homogeneous of these , which, 
of course, is the group for which the reliability 
coefficient is the smallest. 

The effect of curtailment in range of a first 
variable upon the reliability coefficient of a 
second variable . If a reliability coefficient 
is computed for a range of talent which has been 
restricted in a known manner upon the basis of 
another variable, it obviously is, in general, an 
inaccurate measure of the reliability when no 
such restriction is present. For example, the 
reliability of a test, X 2 , computed for those 
who have attained a certain passing mark in X z 
is, in case X x and X 2 are correlated, smaller 
than the reliability of X 2 for a group consisting 
of those who have failed as well as passed, X x . 
As deduced from a formula given by Davis (1944), 
the reliability, R 2 > in the wide range, knowing 
that in the narrow range, r 2 , knowing the corre- 
lation, r 12 , in the narrow range, and knowing 
V±/v ± , the ratio of the unrestricted variance in 
X x to the restricted variance, is 





[11:48] 


The effect of curtailing the range of one of 
the variables enter ing into a cor relation problem. 
Let the variables have the correlation R x 2 and 
let it further be true that the arrays are homo- 
scedastic so that V lm2 is constant for successive 
X 2 values. If there is an imposed al teration in 
the X 2 values so that one or more of the arrays 
of X x values are removed entirely, or if a ran- 
domly chosen number of cases is withdrawn from 
any one of these arrays , it will not change 


' f 


<0 ‘ 
i f 


I a 
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either V 1>2 or Z? 1<2 . Thus, letting lower-case 
letters stand for the situation after the imposed 
curtai lment in the X, measures and a consequential 
curtailment in the X 1 measures, we have: 

[11:49] 


^1.2 S ?!. 


; ^(l-r\ 2 ) --- V.(l-Ri 2 ) 


' 12 


0 V, 0 V\ 

D . r 2 -L = O- J- 

0 12' *12 A 12 

V n 


. [11:50] 


so that 


12 




v.d-ru) VJl-Rl,) 


Relationship when a 

V.fVr, relationship [11:51] 

i s imposed 


Of course, when selection is on the basis of the 
second variable, V 2 .i ^ ^ 2.1 anc ^ &21 ^ S 2 i* 
Though there has been alteration in the variances 
of both variables, only the v ? /V ? ratio is ma- 
terial in estimating the change in the corre- 
lation coefficient. Solving [11:51] for r^ 2 , we 
have, as derived by Pearson (1902 Inf.), 


• 2 - 
12 


L 1 2 


*12 


[11:52] 


12 


Table XI D gives the change in correlation 
with different curtailments (see Kelley, 1923). 

TABLE XI D 

RELATIONSHIP BETWEEN CORRELATION AND AN IMPOSED 


a 2 

if r=.I 
THEN R= 

r=.2 

R = 

r=.3 
R = 

r-.4 

R = 

r= 6 

R = 

r= 8 

R - 

r=95 

R = 

.75 

.133 

.263 

.387 

.503 

.707 

.872 

.971 

.50 

.197 

.378 

.532 

.653 

.832 

.936 

.987 

.25 

.373 

.632 

.783 

.868 

.949 

.983 

.997 

. to 

.709 

.898 

.953 

.975 

.991 

.997 

.9995 


l 
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An interesting possible use of [11:51] lies 
in the study of biological phenomena. If, for 
some form of life, in one geographical region 
two characters have the constants v 1 , v 2 , r 12 , 
and in a second distant region the constants V ± , 
V 2 f R x 2 , and if it is believed that the life in 
one region is a migrant and variable form of 
that in the other, then the two characters X 1 
and X 2 may be investigated to see which is the 
more causal in causing the selection. If r* 2 / 

[v 2 ( l~r£ 2 ) ] is more nearly equal to 2 / [V 2 (l~i?^ 2 ) ] 
than r\ 2 / [v^l-r* 2 )] is to !?l 2 / [V 1 ( l-ff^ 2 ) ] then 

the indication is that X 2 is more closely con- 
nected with the cause of selection than is X ± . 

If the curtailment is an arbitrary cutting 
off in the case of one variable of a portion of 
a distribution which is believed to be approxi- 
mately normal when uncurtailed, the ratio v 2 /Y 2 
can be readily determined, knowing the portion 
which has been deleted. For example, let us say 
that Test X 2 is employed as an admi ssion- to- 
t raining- in- the- Ai r- Corps instrument; that the 
form of distribution on this test of all appli- 
cants is approximately normal; that arrays are 
essentially homoscedasti c; and that the lowest 
25 per cent of those taking the test are not ad- 
mitted. For those admitted, the correlation of 
X 2 with attained flying efficiency is r 12 . What 
reasonably would have been the correlation had 
all the applicants been admitted to training? 
We find the ratio of v 2 /V 2 by [8:29], introduce 
the result into [11:52] and find the R 12 which 
is equivalent to r x 2 . Had r 12 been .50 the an- 
swer would be R 1 2 - . 62. * 

The effect of double selection upon correla- 
tion . This problem was attacked by Pearson 

* This procedure extensively used by Carl Brown, in Robert M. 

Yerkes, ED., PSYCHOLOGICAL EXAMINING In the U.S.Army, Nat. 

Acad, of Sc?., Vo I • IJ, 1921. 
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(1908) in the case of normal bivariate correla- 
tion. The explicit solution is provided in For- 
mula [11:53], derived by Vern James.* 

-( 1 -SfPSA +/(l-ff? 2 ) J V' 1 V4v 1 vX 2 

r !2 ' 

2 CTj O' 2 R 1 2 

[11:53] 

The specific situation to which this formula ap- 
plies requires that (a) the distribution prior to 
selection, represented by capital letters, be a 
normal bivariate distribution and (b) that the 
distribution after selection, represented by 
lower-case letters, be a normal bivariate dis- 
tribution. 

Formula [11:53] may be written 


R 


1 2 


v i v ? ( l“ r i 2 ) 


12 

or again 


v 1 v 2 (i-r 2 12 ) 


. [11:54] 


12 


[11:55] 


^1 . 2 °2.l . 2 ^2.1 


revealing an invariant relationship which may 
well be important in the study of natural selec- 
tion. 

* A class paper submitted to the writer by Vern dames, Febru- 
ary, 1925. dames made the following Happy substitutions: 
He let Pearson's = X; Pearson’s Pearson's 

Sj/cq = a; Pearson's ]>^/cr 2 = b. 
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section. 3 . THREE-VARIABLE MULTIPLE CORRELATION 


The three- variable problem will be approached 
in several ways because it involves practically 
all of the issues inherent in the general, or In- 
variable, problem and is algebraically much sim- 
pler than the general multiple-correlation prob- 
lem. 

Let X 0 , the criterion or predictand, be a 
measure which it is desired to estimate from a 
knowledge of X x and X 2 , the independent variables 
or predictors. There is so great a simplifica- 
tion when the regressions of X Q are linear that 
it is desirable to transform X ± and X 2 into new 
variables yielding linear relationships with X Q , 
if a preliminary study of the X Q , X x scatter 
diagram, and/or of the X Qf X 2 scatter diagram, 
reveals demonstrable nonlinearity. Though non- 
linear regression of X x or X 2 upon X 0 is not a 
major detriment, the writer has frequently found 
that a single transformation of the X 0 variable 
will frequently rectify the regression of X 0 
upon a number of other variables, X 1 , X 2 , Xy 
etc. , — a result otherwise accomplished only by 
making transformations in all of the predictor 
variables. 

In the following treatment we assume that the 
regression of X 0 upon X x and upon X 2 is approxi- 
mately linear, and that an equation of the type 


X 0 = a + h 1 X 1 +b 2 X 2 


3~va r I ab I e 
reg resslon 


multiple n cx , 
equation Llii Jt>J 


= b. 


+ b 2 x 2 


[II : 56a] 


will constitute a sound basis for estimating X Q . 
A more explicit notation for the regression 
coefficients is b 01<2 , — read "the regression of 
X 0 upon X x for X 2 constant” , — and b Q2 al . Where 
there is no ambiguity we shall use the shorter 
notation for both b and JB (see [11:57]) regres- 
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sion coefficients. 

Let standard z Q , z 1 , and z 2 scores be as de- 
fined by [8:23]. Then [11.56] is represented by 


an equation of the type 

*0 = @ 1 Z 1 + A> Z 2 equation ' 6 ^ re 9 reSS '° n [ll: 57 ] 
We have 

hi = Pi—, and b 2 - [11:58] 

° 2 

a = M Q-h x Mj-b 2 M 2 [11:59] 


When the standard score z Q is estimated by means 
of [11:57] the variance error of estimate, 12 , 
is as follows: 

k 0.12 = ~ 1 ( Z 0 ~Pi Z 1~P2 Z 2) 2 

N [11:60] 

= 1 + J3 2 + - 2/3 1 r 03L - 2/3 2 r 0 2 + 2/3 1 /3 2 r 12 

It is desired so to determine the j3’ s that ^ 0.12 
shall be a minimum* We may, without loss of 
generality, write fi 1 = fi 2 y and introduce y into 
equation [11:60]. We obtain, 

kl m 1 2 = [f3 2 z y 2 - 2/3 2 y{r 0 i~73 2 r i 2 ^ r oi~^ 2 r i 2 ^ 

+ 1 +,62 ~2/3 2 r Q2 - (r 01 ~/3 2 r 12 ) 2 

To make ^ 0,12 a 7 must be given such a 

value as to make the [] term, which is a perfect 
square, equal to zero. Then 

^ 0 . 12 " ~^A> r G 2 ~ r 01 + ^^ 2 r 0 i r i 2 "*^ 2 r i 2 ~~^ 2 ^ ,, " r i 2 ^ 
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This is a minimum when the [] term, which is a 
perfect square, is equal to zero, thus 

*"o 2 ^"o 1^1 2 J3, or standard score, multiple 

= regression coefficient [11*61] 

* 1 2 (3 var ? ab ! es) L 

1 "‘ r l 2 


/?! 


r 0 l" r 0 2 r l2 


i-'L 


[11:61a] 


k 2 
K o t 


1- 


0 1 


0 2 


-ri2+2r 01 r 0 2 r 12 


1 1 


*12 

Variance error of estimate of standard 
score criterion 13 variables] 


[11:62] 


The correlation between z Q and *z Q is the multiple 
correlation coefficient and is always positive. 
Since there may be various estimates of z 0 , a more 
explicit notation than z ^ is needed. We shall use 
z oAi 2 > which may be read "that part of z 0 which 
is dependent upon z x and z 2 . " The subscript A 
may be thought of as standing for the word "de- 
pendent, " just as in z 


■"that part of z Q which 
is independent of z ± and z 2 , ,; — the subscript dot 
may be thought of as the dot to the letter i in the 
word "independent. " The correlation between z Q 
and the best linear combination of z 1 and z 2 is a 
multiple correlation. We shall designate it r 0 ^ 12 . 
Other notations are r 0ll2J , and r Q ^ 12 , 
but this last, which the writer has employed 
earlier, will be avoided as incongruous with the 
"independent of" concept which, as here used, at- 
taches to the subscript dot. For the correlation 


0 . 12 > ' 


between 


and 


ent of z, and z 


that part of z Q which is independ- 


2 9 


we shall, as in the preceding 
*o.i 2 ’ Though r c 
would be an entirely logical notation for this, it 


derivation, use the notation * 0 . 12 * 111UU 8 U x o.i 2 
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will not be used in this text because of its 
earlier use in' the opposite sense. Utilizing the 
principle inherent in [10:16] and [10:17], 

=F(z oAl2 ; + f(z 0>12 ; 

1 - (l~kl # 12 ^) + . 12 = r oAi 2 + . 12 [11: 63 ] 

Some computational procedures yield &o„i 2 > 
which case we compute **oAi 2 £ rom the equation 

r oAi2 ^ ~ ^ 0.12 [11:64] 

Other computational procedures yield the /3 re- 
gression coefficients, in which case it is simple 
to compute r Q j , which is identical with ** 0 Ai 2 » 


thus, when we note that cr(z 0 ^ 12 ) " r oAi2» 

^ Z 0^1 Z l + ^2 Z 2^ ^i r 01 + A> r 0 2 

r oAi2 ~ r oioAi2) 

N V( Z 0&12) r 0Al2 

r oAi2 ^i r oi + ^2 r o2 [ 11 : 65 ] 


Computational formula for 
knowing 3 regression coefficients 

The variance error of z 0 , as estimated by 
means of [11:57], is k^ 12 , and of X Q , as esti- 
mated by means of [11:56], is 

V 12 =^0*0.12 = V 1 - r 0Ai2>> [11:66] 

Variance error of estimate, — 3 variables 

Corresponding to the analysis of variance 
equation [11:63] is the following which involves 
raw X 0 scores: 


[11:69] 
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^0 ^0Al2 ^0.12 


.. o . o x Analysis or xne 

V 0 ~ V 0 F 0&12 + ^ 0 a-r oA i 2 ; variance of X Q [11:673 

score s 

N- 1 - 2 + (N- 3) Degrees of freedom equation [11:68] 


That the residual variance V 0 ^ 12 has (N~3) de- 
grees of freedom is obvious when we note that the 
following three linear restrictions, and none 
other, have been imposed upon the X 0 scores: 

NM 0 = 1X Q ; Nc 0l =l(X 0 ~M 0 )x i; Nc 02 =X(X 0 -M Q )x 2 


The nonlinear restriction NV Q - H(X Q ~M 0 ) 2 is not 
involved in a, b 1$ or b 2 , \ may be written 




O’, 


C 0X C 0 2 
~ ~ Cr 2 r i 2 


(It? < 


a function not involving cr 0 , or V 0 . Similarly 
for b 2 . We can get the number of degrees of 
freedom of V 0 ^ 12 by subtracting (A-3) from(A-l), 
or by noting that the regression equation [11:56] 
has two degrees of freedom due to the two para- 
meters b x and b 2 , for M 0 , the additional para- 
meter involved in a, has already established the 
degrees of freedom of V 0 as (N- 1), not N. 

The variance ratio, 

_ r oAi2 // ^ 

f2 ’ N -3 ~ (l-r^ Alz )/(N-3) [11:69 


provides a test for the significance of ('r^ 12 ~0 y ). 
In the special case in which r 12 =0, — a situ- 
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ation which can frequently be brought about by 
an appropriate design of an experiment, — we note 
that all of the correlations between x x , x 2 , and 
x o.i2 are e qual to zero, so that we have: 


0 A 12 


+ X 


0.12 


= b 1 x 1 + b 2 x 2 


+ x 


0.12 


V = b 2 V + b 2 V + V 

y 0 "l K l ° 2 V 2 T V 0 . 12 


Analysis of when 

X 1 and X 2 are inde- [11:70] 

pendent , -3 variables 


N~ 1 =1+ 1 + (N- 3) Degrees of freedom [11:71] 

equation 

The variance ratios, 


i, N-3 


bj yi 

V 0 . 12 ^( N ~ 3 ) 


and 


F 


1, N-3 


b\ v 2 /i 

v 0 ' l 2 /a- 3 ) 


[11:72] 


Variance ratios when predictors are uncorre- 
lated provide tests for the significance of (b^O) 
and of (b\~ 0 ). If we desire a test for the devia- 
tion of b 1 from % 9 a value not zero, we have, 


Jh^b ,) 2 V 1 

F1,N " 3 ~ n. 12/^3; 


(predictors indepen- 

dent) (cf [ll:78]) L.H* T3J 


The standard score regression coefficients, / 3 1 
and / 3 2 may be interpreted in connection with 
partial correlation , which is the correlation be- 
tween parts of variables . If z 0s2 is that part 
of z Q that is independent of z 2 it is given by 
z o.2 ~ z o~ r 02 z 2* Its variance is fc * 2 = l-r^. 
Similarly z Xm2 - z i~ r i2 z 2 * and the variance of 
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* 1.2 is &12 = l"” r i 2 * The correlation between 

* 0 . 2 and * 1-2 is written r 01>2 anG ^ is read n the 
correlation between variables 0 and 1 independent 
of the effect of variable 2, " or "the correlation 
between variables 0 and 1 for variable 2 con- 
stant. " 


2Z 0 ,2^1. 2 


r 0 1 *~ r 0 2 r i 2 


01.2 


Nk o. 2 k i .2 A-r 2 0 2 /l-r- 2 2 

Partial correlation coefficient,- 3 variables 

Utilizing [11:61a] we obtain 


[11:74] 


@1 ^ 0 1.2 r 01 . 



[11:75] 


Relationship between regression and partial 
lation coef f i c i ents, -stand ard scores 


^1 ^ 0 1.2 F 0 1 . 2 


'0 . 2 


1.2 


cor re- 


[11:76] 


Relation between regression and partial corre- 
lation coef f i c i ents 


Yule (1907) and Fisher (1923-24) have shown the 
similarity in the properties of and the distri- 
bution of the partial correlation coefficient to 
those of a total correlation coefficient. The 
number of degrees of freedom in the partial 
correlation coef ficient is less than in the total 
correlation coefficient by the number of second- 
ary subscripts (those after the point) in the 
partial correlation coefficient. 

If x la2 were used to estimate x Q ^ 2 1 the re- 
gression equation would be 

^ 0.2 

X 0 . 2 ~ r 0 1. 2 X l. 2 ~^01. 2 X l. 2 ~ X l. 2 [11:77] 

.2 


Thus we note that the proper weight to attach to 
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x x> in [11:56a], in estimating x Q is identically 
that which would attach to that part of x x which 
is independent of x 2 when estimating that part 
of x 0 which is independent of x 2 . Similarly for 
the other independent variable® 

Reference to [11:77] informs us that 6 01a2 is 
a regression coefficient in a two-variable prob- 
lem in which the variables are x Q 2 and x ± 2 , 
each having N - 2 degrees of freedom, so that a 
precise variance ratio test is given by [10:39], 
which becomes 


1, N-3 


^^ 01.2 ^0 1 . 2 ^ ^ 1 . 2-1 ^*“3 - 
^0 . 2 ( ^~ r 0 1 . 2 ^ 


Since V x # 2 ^ik 12 ^d V Q # 2 Cl“* r oi . 2 ) ^0 . 12 ^ 0 ^ 0 . 12 


^01,2 ’"^0 1 . 2 ^ ^1^12 ^ j 


if n-3 


V k 2 

v 0 .12 


[11:78] 


F to test (b 01> 2^ 01. 2 )* See [l2 : 48 J 


The parent population must here be conceived as 
one in which the identical x., and x 2 values as 
paired in this sample are repeated upon all suc- 
cessive samplings® Under these conditions of 
sampling there will result a succession of values 
of k 0 i. 2 # It * s t ^ ie variability of these that 
is tested by [11:78]. Since this variance ratio 
has one degree of freedom in the numerator vari- 
ance, we may write 


VO>01.2> 


V k 2 
y o K o , 


12 


V^CN-3) 


Variance error of re- _ 

■ gresslon coe f f i c i ent [11 ; 79] 
three variables 
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Determinantal expression of multiple correla- 
tion relationships: We herewith give these re- 
lationships for the three-variable problem and 
state, without proof, that they also hold for 
the general case of k variables, if expressed in 
connection with a major determinant, of the type 
A herewith, but expanded to include the added 
variables. For the three-variable case we have: 
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A = 


01 


r 01 r 02 
1 r i2 


Major correlation deter- 
minant, 3 variables [11:80] 
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o 1 


A 
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0 2 



Minor obtained by deleting the 
0 row and 1 column 


Other minors are similarly designated and A, with 
subscripts, stands for a co-factor, which is a 
minor with a sign factor attached. 


1-r 2 a = k 2 

1 r 0Al2 K 0 .12 a 

^0 0 

Multiple alienation coefficient, or variance 
of estimated standard scores 


[11:81] 

error 
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=/A4lL = /A4iL [11:83] 
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fioi .2 ancJ ^ 10.2 have the same sign, and the sign 
that attaches to the radical is this sign. These 
two /3 f s are conjugate regression coefficients. 

section 4. NONLINEAR REGRESSION 

Notation illustrated in connection with a 
three-variable linear regression situation. The 
theoretical issues and the illustrative problem 
will refer to a quadric or second-degree para- 
bola, but the method can be immediately extended 
to higher-degree parabolic regression. It is, in 
fact, so slight a modification of multiple-re- 
gression procedure as to call for few new con- 
cepts. A new notation will be found convenient. 
By illustrating this for a three-variable linear 
multiple-regression problem, the similarity be- 
tween that problem and parabolic regression will 
be made apparent. In connection with the three- 
variable linear- regression problem there are 
slightly different procedures dependent upon 
whether the task set is to compute the constants 
of [11:56], [11.56a], or [11.57]. The major de- 
terminant pertinent to the determination of the 
constants in [11:57] will be called A, or A, that 
pertinent to [11:56a] that pertinent to [11:56] 
d t or 0, depending upon whether means or summations 
are employed. 'We employ p to designate a product 
moment involving x, or deviation, scores, P a pro- 
duct moment involving X, or gross scores, and 2 a 
summation of gross scores. 2, P, and P will each, 
in this three-variable problem, have three sub- 
scripts, the first indicating the power to which 
the first variable (X 0 orx Q ) is raised, the second 
the power to which the second variable (X ± or x 2 ) 
is raised, and the third the power to which the 

third variable (X? or x 2 ) is raised. Thus 

'Zxlxix't 
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Pi 0 1 Poll Po 0 2 
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p 

1 ltr 101 


1 Jr x JT S 


^*110 ^ 1 P 0 2 0^0 1 1 

^1C1 ^2 P 011 P 002 


[11:85] 


in which 2 0 0 0 -N [11: 86] 


The complete solution, starting with A, is as 
indicated in Section 3. The solution starting 
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starting with S or d will be obvious from that 
based upon D which is given herewith. 

When using subscripts we will call the first 
row of D the 0-row; the second row the a- row; 
the third row the 1-row; and the fourth row the 
2-row; and similarly for the columns. Thus D 0a 
is the cofactor of the element 2 100 . Note that 
the determination of the sign factor, which is 
simple in any case, must be as though the num- 
bering were 0, 1, 2, 3 and not as here used, 0, 
a, 1, 2. 

The various statistics, most of which are un- 
necessary if the regression equation only is 
desired, are 


^10 0 



^010 



^0 01 
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-200 0 ^020 0 ^002 0 
F. =— ■ ~M 2 0 ; V 1 =— ■ -Ml; V 2 =— -if* [11: 88] 
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nr ~ m i m 2 


r i2 * 


-A 


0 a 


^01 


-D, 


0 2 


a= — b 2 =— - [11: 90] , [11: 91] , [11: 92] 
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r oAl 2 



D 

nv 0 d oq 


Squared multiple 
co r re ! at ion 


[11:94] 


r 01.2 *^ 0 1 . 2 ^ 10.2 


D 2 
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^00*>11 
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Partial correlation coefficient 
Isign of r 0 i 2 ‘ s that of ^01. 2 J 


0 2.1 


*>12 
l ®0 0^2 2 


D 


12 


* r 

* 12.0 




Though the proof is omitted, it can be stated 
that the preceding formulas of this Section hold 
for problems of any number of variables by simply 
adding the necessary secondary subscripts to 

^0.12» r 0Al2 > r 01 . 2 » etC# 

Parabolic regression . If the third variable 
is simply the second squared, the regression 
equation is 


*0Ai,i 2= ^ +CX i +CX i 


Second-degree para- 
bolic regression 


[11:96] 


The linear estimate, X 0 , may be written 


XqAi = 3 + 6X x Linear regression 


[11:97] 


In the case of [11:96] the correlation obtained 
may be designated r 0 ^ 1 x z and called a second- 
degree parabolic correlation coefficient. 

If we substitute X 2 for X 2 in [11: 56] the 
major determinant [11: 86] then becomes 
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and thence the procedure is as before. The cal- 
culation of the variance error, of the parabolic 
correlation, and of the regression constants A , 
B, and C follows the pattern of [11:93], [11:94], 
[11:90], [11:91], and [11:92], but the variances 
of B and C do not follow the pattern of b ± and 
b 2 [11:79]. C could be written b Q1 2 and read, 
n the regression of X Q upon X\ for fixed values 
of X ± . " However, when X 1 is fixed of course X* 
is also fixed. We are unable to determine the 
variance of b Q2tl without fixing £> 0 1 m 2 , (see 
Chapter XII), but now a different and more power- 
ful procedure is open to us. 

When X Q is estimated from a knowledge of X x , 
we have [11:97] and we can analyze V Q into inde- 
pendent parts, 

^WoAi^o.i [11:99] 

/V~l = 1 + ( N~2 ) Corresponding d.o.f. equation [H;10Q] 


v 0 is the variance of X Q dependent upon X l and 
equals ^ 0 r oi' ^oAi the variance of the points 
on the linear regression line and V 0ml is the 
variance of the divergences from these points. 

In the case of second-degree parabolic regres- 
sion we have [11:96] and we can analyze V 0 into 
the following independent parts: 


' r 0='W,l* + >Vl,lS 


[ 11 : 101 ] 


jy~l = 2 -f (/V- 3) Correspond! ng d.o.f. equation [11:102] 


[11:106] 
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F c A 2 is the variance of the points on the 
second- degree parabolic regression line and 
F 0 ■. ,2 is the variance of the divergences from 
these points. 

Also F 0>1 of [11:99] can be analyzed into 
independent parts. A part which can be estimated 
from X{ but not from X 1 and a part which cannot 
be so estimated: 


Vi o. l ) At i 2 . i > + o . l i . u 2 . i J 

= V ( o . li An 2 , i) + V o . l, i 2 * ' • 
Substituting in [11:99] we have 

= ^o£i ~^io.i)A(i 2 .i) + . l , i 2 

Voi + ^o( ,f oAi,i 2 ~ ro i^ + ^o^ -r oAi,i 2 ^ 

N“ 1 - 1 + 1 + ( TV — 3 )• d.o.f. equation 


. [11:103] 


[11:104] 

[11:105] 


^oAi ” t ^ ie variance o £ the points on the 
linear regression line. 

V r ( o.i )A( i 2 . i j “the variance of the differences 

between the points on the second-degree line and 
the points on the linear line. 

I^q i 1 2 " the residual variance, or that of 

the divergences from the second-degree regression 
line. 

The variance ratio test of the need of a linear 
regression line, — b in [11:97] t 0, is 


*1 


t N -2 ) 


r* 01 (N-2) 
1 - r 01 


F to test need of at 
least linear regression 


[11:106] 


.Again, we have 
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( r oAi , i 2 'oiXtf-30 


UN-3) 


^ ” r o Ax , i 2 

F to test need of at least second-degree regression 


[11:107] 


Similarly, 


^ r oAi, i 2 i3 -r oAi, 


l-r„ 2 


1IHI 

[ 0Al,l 2 l 3 

F to test need of at I east third-degree regression 


[11:108] 


And so forth to any desired degree of parabolic 
regression that one may desire to test. 

If the X ± variable is grouped into k classes, 
a parabola of degree (/c~l) will pass exactly 
through the means of the arrays so that a para- 
bola of higher degree will give no better fit. 
The F test will show this, but prior to this, if 
successively higher-degree parabolas have been 
employed, a P-from-F > .5 may have been obtained. 
This would indicate that the degree of the para- 
bola used was too high,— a better fit being 
gotten than was to be expected from the size of 
the sample dealt with. When successive tests are 
made, one should stop with the parabola of such 
degree that P from F at this point is approxi- 
mately equal to . 5. 

Another procedure, leading to the same outcome 
and frequently less time-consuming, is to compute 
€ , the unbiased correlation ratio squared, and 
terminate the testing of higher and higher-degree 
parabolas when the r‘ obtained is negligibly less 
than e . Practical considerations might assert 
that an r 2 of, say, .800, was negligibly less 
than an e‘ of . 802, even though the sample was so 
large that P from F at this point < .5. 
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This method is of specific value in testing 
the adequacy of linear and other regression 
lines. Also, it is a very general method in that 
it applies when one variable is quantitative and 
the other qualitative. 

In the case of linear regression, the pre- 
dictand, X 0 , can be expressed as the sum of the 
prediction, X oAl , and an error of estimate, *o.i> 
The four equations following have already been 
established: 


ll 
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X 

X oA i + 

X 0 .i 


II 

o 

w + 

r-i 

O 

Anal y s i s of variance 
equation 

N-l = 

1 + 

(N-2) 

O.o.f. equation 


*1 

O N1 

D> 

i— * 

ll 

^oAi/^o 


quadric 

regression we have: 

*0 = 

XoAl ,1 

2 + X 0 , 1 t 1 

2 

1! 

o 

ToAi, l 

2 + ^0.1,1 

2 

N~1 = 

2 

+ (N-3) 



r oAi,i 2 ” V oM,H /V o 

Similarly for higher parabolic regression up to 
that of k - 1 degree, at which point a perfect fit 
is attained, that is, the regression line passes 
exactly through the means of the k successive 
arrays. At this point we have: 
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x o *oAi, i 2 . . . i k_1 + *o . 1 , i 2 


x 


k - 1 


^ ^oAi, i 2 . . . i k _1 + ^o.x, i 2 . . .i k_1 
A'-l = {k- 1) + (N-k) 

r oAi,i 2 . . .i k_1 _ ^oAi,! 2 . . .i k_1 Wo 

The variance A-i , 1 2 . . . i k ~ 1 * s s i m ply that of 

the means of the X Q 9 s for the k successive arrays 
of X 0 1 s. Designating these means M 0 t , in which 
i-l, 2 , . . . k, and the squared correlation coef- 
ficient in this case of perfect fit 77^ , we have 


*^0 1 


v(if 0 .) v 0 x 1 2... 1 k-i 

-7— - 1 v 

v 0 v 0 

The raw correlation ratio squared 


[ 11 : 109 ] 


To obtain V(M Q] ) compute the k means of the arrays, 
weight each by the number of cases in the array, 
and compute the variance. 

The numerical values, if there are any, that 
attach to the k class indexes of the X ± 7 s have 
not entered into this computation so no ordered 
or quantitative relationship between the X ± 
classes enters into this measure. Accordingly 
the correlation ratio is applicable when a quan- 
titative relationship between the X x classes is 
unknown. 

Also, of course, if there is a known quanti- 
tative relationship between the X x classes , the 
employment of r ) 2 uses but a part of the avail- 
able information. 

It is not to be expected that any regression 
line of less degree than k~l will exactly hit the 
means of all the arrays. Accordingly ^oAi* 
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^oAi , 1 2 ' y oAi,i 2 i 3 *** V oAi,i 2 ...i k " 2 constitute 
increasing series whose limit is V(M 0 j ) . Also 

r oAi » r oAi , 1 2 * ‘ ’ ‘ r oAi , 1 2 • .. i k ~ 2 constitute an 
increasing series whose limit is 77^. The dif- 
ference between rj 2 ± and r oAi. 1 2 . . . iJ* a measure 
of the adequacy of the /-degree regression line 
to account for the relationship between X 0 and 

However, when we bear in mind that the rela- 
tionship that we seek to discover is that main- 
taining in the population and not that in the 
sample we must believe that a regression line 
that fits the sample perfectly overstates the 
intimacy of the relationship in the population. 
We should therefore discount, or shrink, both r 2 
and 7 ] 2 to get estimates of the population values 
before we compare them. Both r 2 and 17 2 are equal 
to expressions of the sort 

V (errors of estimate) 

1 ~ y 

v 0 

A similar expression using estimates of the 
population variances will yield the desired 

shrunken values of r 2 and 7 ] 2 , which we label s r 2 

and £* For estimated population variances we 
have: 

v 0 r v ° TT See [6:09] 
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i, i 2 . . . i k 


■ i= V 




k - 1- 


N 


[ 11 : 113 ] 


Accordingly the shrunken values of the squared 
correlations are 


2 — 2 
s r 0 1 ~ s r oAl 


^- 1 > o 2 Ai -1 
N- 2 


Correction in r 2 for shrinkage,- 
linear regression, cf ( 12 : 36 ) 

2 _('A’- 1 >oAi,i 2 " 2 

s r o A i, i 2 n _ 3 

Correction in r 2 for shrinkage, - 
quadric regression 


[ 11 : 114 ] 
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_ 2 _ . _^“ 1 >oAi,i 2 ...ii-i 

s r oAi, i 2 . . - i J " N-j-1 

2 

Correction in t for shrinkage,- parabolic 
regression of j th degree, cf (12:36) 
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e oi - 


S^Ol 


(a-d-j? 2 ! - * + 1 n-i 

-=!-—( 1 - 7 &) [ 11 : 117 ] 


n - k 


Correction in T) for shrinkage,- k classes (Kelley, 
1935 and Peters and Van Voorhis, 19M-Q, Ch. XI and 
XIII), also called correction for fine categories 


If a series of r 2 values determined from linear, 
quadric, cubic, etc, regressions are computed 
that regression yielding an s r 2 most nearly equal 


to e c is the optimal regression and the s r £ 


cor- 
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responding is optimal. For s r 2 < e 2 the regres- 
sion employed does not get all that is of signif- 
icance, and for s r 2 >e 2 the regression used has 
presumably profited by chance. 

We may also test any obtained s r 2 by the var- 
iance ratio 


k - j - 1 , N - k 


'^0 i“ r oAi , l 2 . . . 1 J 
(l-Tg^/UV-k) 


[11:118] 


To test adequacy of g^^ i2 3 J 


As here employed j equals the degree of the pap- 
abolic regression employed and k is the number 
of classes in the predictor variable. 

For some purposes the variance error of e 2 , 
or of 6 will suffice. As derived by Kelley (op. 
cit . ) these are 


ll~e 2 ) 2 2(fc-l) 

Vie 2 ) = 4e 2 ] [11:119] 

N-l N~k 


Vie 2 ) 

V, — (Valid only when £ 2 i s not small) [ll:120] 

€ A 2 

4 e 


Multiple and partial correlation ratios may 
be computed for multivariate categorical series. 
A three -variable illustration is given by Kelley 
in Rietz Handbook (1924). 


CHAPTER XII 

THE GENERAL MULTIPLE LINEAR 
REGRESSION PROBLEM 

section l. A STATEMENT OF RELATIONSHIPS IN CONNECTION 
WITH STANDARD SCORE VARIABLES 

Let it be desired to estimate X Q from a know- 
ledge of n independent measures, X ± , X 2 , . . . ,X n . 
The usual notation will be used for variances, 
standard deviations, correlations, covariances. 
Standard scores are defined as earlier and de- 
signated with the letter z: z = [X~M)/a. The 
desired linear equation of estimate, or regres- 
sion equation, is 

*o **oAi 2 ...„ =a+ Mi [12:01] 

Multiple regression equation 

When involving deviation from the mean scores 
this becomes 

*o s *oAi2...n = b lXl + b 2 x 2 + ...+ bx n [12:02] 
and when employing standard scores it becomes 
2 = Z oAl2 . . . n~ Al Z 1 + fi 2 Z 2 + "'' + &n Z n [12:03] 

Standard score form of regression equation 
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in which there is no constant term, for were 
there one the estimate of z Q corresponding to the 
mean of the right-hand member, treated as a single 
variable, would not equal zero, which we know 
must be the case from the two- variable problem. 





[12:04] 


a =M 0 - b 1 M 1 - b 2 M 2 -...~ bj n . . . .[12:05] 
When [12:03] is employed the error of an estimate 

is Z 0.12...n> 


Z 0.12...n Z 0 ~ Z 0 Al2 . . . n 


- z o ~A z i 


^2 Z 2~* 


-/3 n z n [12:06] 


The variance error of estimate, which is to be 
minimal, by the proper selection of the /3’ s, is 


k 2 
K o . 


12 , 


= ^0. 12 ...n-) = l + ^ + ^ + 


+ /3 n 2 / 6 1 r 01 “. . . 2/3 n r 0 n + 2 / fi 1 /? 2 r 1 2 + . . . [12: 07] 

In the two-variable problem, involving z 0 and 
y, the y being a variable whose mean is zero and 
whose variance is F y , we have 

z 0 Ay = r 0y~ y» and 
y 

v O t aA , r|, [ 12 : 08 ] 

y 

Accordingly, by treating the entire right-hand 
member of [12:03] as a single variable, we obtain 

^ Z oAl 2. . . “ r oAl2...n [12:09] 
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As with the two-variable problem, 
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Z 0 Z 0Al2...n + Z 0 . 12 . . . n * 4 * -[12:10] 

in which the two terms in the right-hand member 
are independent, so that, computing the variance 
of z Q yields 

1 “ r oAl 2 . . . n + * 0 . 12 . . . n * * * .[12:11] 

The error of estimate ( ^o~ z oAi 2 . . . n ^ i s uncor- 
related with z ± , for otherwise Zj, already used 
in the regression equation [12:03], could be em- 
ployed again to reduce the error of estimate, but 
this is impossible because the /?’ s in [12:03] are 
such as to yield a minimal variance error of 
estimate. We thus have, for the covariance be- 
tween z x and ( z 0 “ z oAi 2 . . . n ) 

C K Z o“ Z oAl 2 ...n )z J = c(z 0 Z i’ - C ^ Z 0 Al 2 ...n Z li = 0 

c(z oAi2...n z i } =c(z o z i> =r oi [12:12] 

Also 

c(z oAi 2 ...n z i > =clz 1 (/3 1 z 1 +/3 2 z 2 + ...+{3„z n )] 

= j3 1 Viz 1 )+/3 2 c( z 1 z 2 )+. . . +/3 n c( z x z n ) 
or 

r o i “ Al + r i 2^2 * r i 3^3 + * +r i n /^ n [12: 13a] 

Similarly 

r 0 2 “ r i 2^1 + @2 * r 2 3^3 * * +r 2 n^n [12:13b] 


r 0 n r in^l " l " r 2n^2’ fr 3n^3“ f ' # * -+ ^n 
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These n equations are the "condition equations" or 
"normal equations." Solved sirnul taneously, they 
yield the requisite values of the ft’ s. 

Thence substitution in [12:04] and [12:05]. 
provides the constants of equation [12:01], which 
is the form needed for practical use. 

r oAi2....n can be computed from *0.12. ..n» 
[12:11], which can be gotten from [12:07], or, 
very expeditiously, from [12:14] following: 

We note that 

C ( z 0 Z 0&12. . . n 3"“ c f Z 0^1 2: i + ^2 Z 2 + ‘ • * + Ai Z n^ 

" r 01 @l +r 0 2^2 +- • • +r 0 nAi 


but since, noting [12:08], 


< Z 0 Z oAl2...n> 


0 Al 2 • . . n 


12.. .n r 0Al2. 


we have 

r 0A12... n = r 01^1 +r 0 2 / 3 2 + • • ,+ r 0n^„ [12:14] 

Many methods have been devised for solving 
the simultaneous equations [12:13]. Among them 
are the following, which have distinctive features 
of one sort or another: The determinantal solu- 
tion, as given in Section 3 of this chapter, 
parallels and reveals theoretical development 
and is not uneconomical if four or fewer varia- 
bles are involved. The Doolittle (1878) method, 
(see also Dwyer, 1941, Doolittle and, 1941, Solu- 
tion), one of the many variations of which is 
given in the next section, (This modification was 
suggested in part by an article by Cowden, 1943) 
is generally serviceable, though it becomes 
rather laborious with 20 variables or more. 
Aitken's (1937) method of pivotal condensation 
has much similarity with and many of the merits 
of Doolittle procedure. The Kelley-Salisbu ry 
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(1926), ( See al so Kell ey-McNemar , 1929 , and 
Tolley-Ezekiel, 1927), or other iterative proce- 
dure, is designed for the many variable problem. 

SECTION 2. A HOD I F i ED DOOLITTLE SOLUTION 

We here give a four- variable problem solved 
by a modified Doolittle method, expressed in 
terms of symbols so that what has happened at 
every step is apparent. The multiple-regression 
problem involving more than four variables does 
not involve any principles not covered in the 
four- variable illustration. 

A Doolittle solution based upon gross scores 
can be made. This would immediately yield the 
equation [12:01]. There are two disadvantages 
in this procedure: It involves one more varia- 
ble, namely a, than a calculation based upon de- 
viation scores, and it yields an answer [12:01] 
which holds but part of the information commonly 
desired, for a knowledge of means and variances 
is frequently pertinent to a problem in addi- 
tion to knowledge of the regression equation. 
It is therefore judged to be generally economical 
to compute means, variances, and covariances in 
the first instance and then to perform a Doo- 
little solution with these measures, leading to 
equation [12:02]. Adding the constant a, [12:05] 
yields [12:01] which is generally the most serv- 
iceable form of the regression equation for 
practical use. It is readily established that 
the condition equations when deviation from mean 
scores are used differ slightly from those con- 
sequent to standard scores, [12:13], and are, 
for the four-variable problem, as follows: 

(1) ^ ^ 2 ^ 12 ^ ^3 C 13~ c 01 c f . Table TTT A row l/ 

(2) ^1 C 12 + ^2^2 + ^3 C 23~ C 0 2 


cf.Tabfe Til A row 20 
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( 3 ) i> 1 C 13 + b 2 C 23 + b 3 V 3 = C 03 cf. Table 32X A row 30 

In view of the fact that the coefficients of the 
unknown b* s enter symmetrically (a situation 
leading to grammian determinants) the solution 
of these particular simul taneous equations be- 
comes relatively simple* In the modified Doo- 
little layout of Table XII A the coefficients 
b 1? b 2 , b^ , and the equali ty sign have been 
dropped and certain elements added ( arising from 
relationships in matrix algebra connected with 
the identity matrix) . The student can identi fy 
the steps of the Doolittle process by reference 
to steps ( 4 ) and ( 9 ) herewi th. The Doolittle 
method is simply a conveni en t arrangement of 
steps for the solution of simultaneous equations 
in which the coefficients of the unknowns are 
symmetrical* Dividing the terms of equation ( 1 ) 
by (-F 1 ) we obtain 

(4) H b 1 “ b 2 b 21 - £> 3 £> 31 = ~£> 01 cf. Table XET A row l' 

Multiply ( 1 ) by the second coefficient of ( 4 ), 
namely-b 2 x , obtaining 

( 5 a) ^ 1 ^i^2l'^2 C 12^2l"^3 C l3^2l’“ "^01^21 

( 5 b) "\c 12 ~ b 2 r x ^ 2 ~ b^r^r^^ r oi r iPo ° r 2 

which, using a notation in which A in a subscript 
means "dependent upon” is 

(5) /\i ^2^23Al~ c o 0 At c 3H1 A row 21 

Adding ( 2 ) and ( 5 ) yi elds an equation which is 
void of the b x term, 

( 6 a) b 2 (V 2 ~V 2 ^)+b 3 ( c 2 ~ c 2 3 Al >> ~ c o2'” c o2Ai / 

which, using a dot in a subscript to mean "inde- 
pendent of" may be written 

(6) b 2 V 2 ^ £> 3 C 23 ^ C Q 2 ^ c f* Table A I I . A row 2 
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Dividing by (-V 2>1 ) we have, 


(7) -fc 2 “^3^3 2. 1 ~^0 2. 1 


c f * Table TTT A row 2 ' 


Multiplying (1) by the third coefficient of equa- 
tion (4), namely (~b 31 ), we obtain, 

( 8 ) ***^j_b^3 b 2 C^3 A-, b^ V 3 At 3 Table'XdA row 31 

Adding (3) and (8) gives a second equation void 
of the b 2 term 

(9) boC« t b- t ““ C« 3 *i c f . Table A .1 , 1 A sum of rows 30 

^ ^ jj J- u^. ± and 31 

Equation (6) and (9) represent a three-variable 
problem in which the coefficients of the b ' s are 
symmetrical. Thus we can solve these in a simi- 
lar manner, obtaining a final equation yielding 
which in the fuller notation of Table XII A 

^0 3. 12* 

Table XII A gives a Doolittle solution for a 
four- variable problem in algebraic notation, 
modified to give all the regression coefficients, 
thus obviating a "backward solution, " and to 
give additional essential constants in rows B 
and C. When taking the square root of the squared 
partial correlation coefficients the signs of 
the corresponding regression coefficients in row 
0 must be attached. When computing the multiple 
correlation from the relationship 


r 2 A = \~~k 2 

0A123 x 1 


V 

*0.123 nr1 

0 123 — ^ t$ee rows 0 and 0 0 ) [ 12 1 15] 


the sign is always positive. As illustration of 
the dot and delta notation we note 


Vi + V ».i 


[See [10216]) 


That is , the variance of x 0 may be set equal to 
the variance of a part which is dependent upon 
x x plus the variance of a part which is indepen- 
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dent of x x . And again, 


C Q1 C 0 lA2 + C 0 1, 2 


[12:16] 


or the covariance c Q1 is equal to a part which 
is dependent upon x 2 plus a part which is inde- 
pendent of x 2 . Since 

c o 1 . 2 = ( r oi~ r o 2 r i 2 >o°'i • • • • [12:17] 


[12:16] may be written 


cr n cr n “ r 


01 u 0^1 


0 2 


r i 2 a 0 Cr l + ( r 01 r 0 2 r i2^ a ’0 Cr l> 


so that 


C 0lA2 r 02 r i2 


[12:18] 


The addition of any number of the same second- 
ary subscripts to each symbol does not change 
this relationship. 


'o 1 


u 1 


[12:19] 


The addition of any number of the same second- 
ary subscripts does not change this relationship. 


'01 


’ ^ 21^0 2.1 ^ 01.2 


[ 12 : 20 ] 


The addition of similar secondary subscripts to 
every symbol does not change this relationship. 

The condition equations whose solution is 
provided for in Table XII A are numbers (1), (2), 
and (3). For future reference we add a symmet- 
rical fourth row and write the coefficients of 
the unknowns in the accompanying augmented ma- 
trix: 
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V 


c 


c 

-c 


1 C 12 C 13 ~ C 0 1 
12 ^ 2 C 23 ~ C 0 2 
13 C 23 V 3 ~ C o 3 

oi ~ c o 2 ~ c o 3 ^0 


[ 12 : 21 ] 


In this arrangement the row and column connected 
with the criterion variable are placed last and 
the student will find it generally advantageous 
to arrange the predictor variables so that they 
are in an order, Xy x 2 , judged to be in de- 
creasing order of importance as predictors. The 
advantage of this order is twofold. First, the 
crucial regression coefficient to test for sig- 
nificance is then fo Q 3 ^ 12 , which is the easiest 
to obtain. Precisely to test for the signifi- 
cance of b 0 3 #12 one requires the few computa- 
tions of column III. Those of columns I and II 
can be made later or not at all, as desired, de- 
pending upon the outcome of the test for signi- 
ficance of £> 03 , 12 " Second, should £> 0 , 12 prove 
nonsignificant one would then desire the regres- 
sion equation omitting x and, it will be noted, 
most of the computational work for this is al- 
ready at hand in the columns and rows of Table 
XII A that do not involve x^ . 

In the cell at the intersection of row 1 / and 
column x 2 is the entry ( 21 ), which is to say 
that the numerical value found for this cell 
need not be recorded in this cell, it being in 
a more useful position if recorded in the multi- 
plier column, row 21. Similarly for (31), (01), 
(32), etc. 

Tables XII B and XII C provide the numerical 
data for the four-variable sample problem solved 
in Table XII E. 
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TABLE XII B 

SAMPLE 1-VARIABLE MULTIPLE CORRELATION PROBLEM 
Means and Variances,- N - 125 



X l 

*2 

*3 

*0 

M 

52.20 

13.60 

15.10 

68. 15 

V 

92.80 

119.92 

203.08 

1 10.32 

a 

9.63 

12.21 

11.25 

10.50 


TABLE XII C 

AUGMENTED VARIANCE AND COVARIANCE MATRIX 
(~c Q | is entered) 



x i 

x 2 

X 3 

x o 

x i 

92.80 

17.62 

31.70 

-27.71 

X 2 


119.92 

10.17 

-28.92 

*3 



203.08 

-20.05 

*0 




110.32 


TABLE XII D 
AUGMENTED CORRELATION MATRIX 



(“ r oi 

is entered) 


1 .000 

.101 

.23 1 

-.271 


1.000 

.060 

-.225 



1.000 

-. 131 




1.000 




























table XII 

(CONTINUED) 
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It is to be noted that the check column applies 
to the work between the double vertical rules. 
To obtain a check of the b ’ s it i s necessary to 
substitute them in one of the condition equations 
and see if it reduces to zero. This is most 
readily done employing the constants of the first 
condition equation, as recorded in row 10 of 
Table XII E. In symbols the equation is 

^01.23^1 + fa 02.13 C 12 + ^0 3 . 1 2 C 13 “ c ox= 0 
Thus we have 

.217055 X 92.80 + .119858 X 47.62 + .058671 
X 31.70 - 27.71 = .0002 


This being zero within the accuracy of the com- 
putational procedure constitutes a check. 

There is no independent check upon the compu- 
tational work below row 0, so it is important 
that this be performed with extra caution. 

The multiple correlation squared, as given by 
[12:15], is. 0966, but this has a bias and should 
be corrected for shrinkage by [12:36], The cor- 
rected value is .0742, or s r 0 ^ 123 = .2724. Since 

the single correlation r Q1 (see Table XII C) 
equals .274, it is futile to use all three pre- 
dictors in this particular problem, but we shall 
proceed to apply precise tests to the multiple 
correlation coefficient and to the regression 
coefficients. Utilizing [12:37], we have 


F 

* 3,121 


.0966 x 121 
.9034 X 3 


4.3128 


This being greatly in excess of 1.0, we may be 
sure that ^oAi 23 * s not a c b ance deviation from 
zero. The computation of P for this variance 
ratio will only con fi rm this belief. We do, in 
fact, find that P = .038. 


[12:24] 
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To test the significance of a multiple- re- 
gression coefficient we have a variance ratio of 
the sort [10:78] by adding secondary subscripts 
and noting that the error variance has (N-n~l ) 
degrees of freedom. 


ii.12... );( 




llN-n-l ) 


0.12.. .n 


02722] 
See also 
02:24] 


For the two-variable problem we have 


i Ao ' 


V, 


i . o 


= + y 


? . 0 




i . o 


1-r 2 
1 r o i 


Accordingly, adding secondary subscripts 
12. ..)i(...n throughout,, we obtain 


V. 


.0X2.. .) i(. 


i . 12. 


1-r 2 

1 0 i . 12. 


[12:23] 


Substituting in [12:22] yields 

ro 

^Oi.12... ^1.012.. 


,.n CiV-n-U 


L 1 tN-n-1) 


^0.12. ..n ^ r 0i.l2...)i(...r 


[12:24] 


Since the negative of the reciprocal of the num- 
erator variance is given in row B and the square 
of the partial correlation coefficient in row C, 
all the necessary constants are available to 
apply this test. Also, since this is a variance 
ratio having one d.o.f. in the numerator, its 
square root is a critical ratio and, if the d.o.f. 
of the denominator is not small , then the square 
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root of this F may be treated as a deviate in a 
unit normal distribution. 

The variance ratio to test if h 
chance deviation from zei;o is 


03.12 


is a 


_ (. 058671 ) 2 X 121 

1 ’ 121 .005244X99. 6628X. 993326 


: . 802215 


VF 1,121 = * 8957 and P = - 370 - 

Similarly for b Q2#1 ^ we obtain P ~ .139, and for 
^ 01.23 we ot,ta i- n P “ .040. 

It is equally effective to test the partial 
correlation coefficient as a chance deviation 
from zero as to test the partial regression coef- 
ficient as a chance deviation from zero. We have 
r 03 .i 2 “ ^ • 006674 = .0817. Using the Fisher r- 
into-z transformation, we obtain z 0 3 . 12 = -08188. 
This has a standard error of . 09 129 (- 1//121 - 1 ) . 
The critical ratio is .8969 and to three decimal 
places P - . 370, which is equal to that previously 
obtained for b 0 3 -12 as a chance deviation from 
zero. These two tests are alternatives. The r- 
into-z procedure gives a f of .142 for r Q2>13 
which is to be compared with the previous value 
.139. Similarly for r 01#23 we °b ta in P “ .042, 
to be compared with .040. 

'-w 

For the significance of (M we have a 

test similar to [10:38]. We note that (n+1) 
linear restrictions have been imposed upon the 
X Q measures. They are the restrictions inherent 
in M 0 , c Q1 , c Q 2 , ...,c 0n . Thus we have 


[12:28] 
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F 


1 i 


N - n - 1 ) 


(*o-M 0 ) 2 (N-n- 1 ) 

V 

V O. 12. . . n 


Variance ratio 

test of M n 

in the case^:25] 

o f ‘mu 1 1 i p I y 

cor re f ated data 


SECTION 3. THE DETERMSNAHTAL EXPRESSION OF THE SOLUTION 
OF NORMAL EQUATIONS 


In connection with the condition equations 
[L 2:13] let us write down the augmented correla- 
tion major determinant .4, or A, This corresponds 
to [12:21] except that the variables here are 
standard scores. 


1 

r 1 2 * ‘ 

* r in 

r oi 

r 21 

1 . . 

• r 2n 

r 0 2 



«■ 

• 

# 

* 

* 


' 

r nl 

r n2 • * 

. 1 

r 0 n 

r o 1 

r 0 2 * * 

* r 0 n 

1 


[12:26] 


The following relationships hold, in which, of 
course, A Q . is the determinant obtained by cross- 
ing out the last row and the variable i column of 
the major determinant 4, and 4 Q . - (-T) n + I +1 A 0I „ 

fe o.i2...n“ T [12:27] 

°0 0 


r 0 Al2. . . n' 


1 - 


0 


[12:28] 


•A 


Pi 


01 


-A, 


vi 


k 00 


A 0( 


■ ? yS 2 


0 2 


^05 


-A 


*0 0 


^0 


A =- 


03 


*0 0 


4>. 


■j etc. 
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A 


-A 


o i 


l O 0 


A 

( — ]_) n + ■ +1 J_2_L 
A 


[12:29] 


o o 


r 0i.l2. .. M U. .n A)j.12... H l...n A 0 . 12. . . Jil . . . n [12:30] 


®oi,i 2 ...)l(...n is the expanded notation for Bj , 
as given by [12:29]. The reversed parentheses, 
) i ( , indicate the i is omitted in the series of 
secondary subscripts 1, 2, . . . ,n. Since A Q f = 4. Q , 
and the sign factor for the position (0,i) is 
the same as for the position ( i , 0 ) , we can write 


‘o ? 


01.12. 


) I ( « 


^00^1 { 


A 2 
A 0 0 A. 


The squared 

P art! *' [12:31] 

correlation J 

coe f f i c i ent 


When extracting the square root to obtain 

r 0 i . 12 . . . J I ( . . . n 

the sign is the sign of “4 Q ., or in other words 
it is the sign of /?. . 

Since /B. vanishes with this partial correla- 
tion, cfte significance of /3] as a deviation from 
zero is exactly the significance of 

**0 ? . 1 2 n 

as a deviation from zero. The distribution of a 
partial correlation coefficient is the same as 
that of any other correlation coefficient, when 
proper allowance is made for its reduced degrees 
of freedom. The total correlation coefficient, 
r Q1 , has (A’-2) degrees of freedom; its variance 
error is (l~ r Q i) 2 /(A'-2) , as given by [10:46], of 
Chapter X; and when transformed into a Fisher z, 
see [10:43], it has a variance error, see [10:44], 
of 1/ (TV— 3 ) . Accordingly, in the case of this 
partial correlation in this (n+1) variable prob- 
lem, 
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nrr 0I 


( l -r n : 


Oi , 12. . . ) i ( . . . n> 


0? . 12. . . ) ! (. . . n> 


tl~nr 1 

Variance error of partial r 


[12:32] 


V(z from partial r )~ — 


A'~n~2 


Variance error of 

partial Z [12:33] 


Formula [12:32] gives the variance error of 
the correlation between X Q and X ± for fixed 
values of X 2 , X^,...X n . This is to say that the 
sample in question is one of an infinite number, 
all of which have the identical X 2 , X , ... ,X n 
values and in the identical pairings, X 2 with X^, 
X 2 with , etc. , as in thi s observed sample. 
Thus the only variables which change from sample 
to sample are X Q and X 1 . The partial correla- 
tion r o i . 23 . . . n » read "the correlation between 
X Q and X 1 for fixed values of vari ables X 2 , X , 
•■•,X n , n has no meaning unless X 2 , X ,...,X n are 
fixed. This partial correlation is indeterminate 
unless X 2 , X^,...,X n have fixed values , so no 
formula based on the correlations in a single 
sample, for the variance error of a partial cor- 
relation coefficient (or of a regression coeffi- 
cient in a multiple variable problem) is pos- 
sible under totally free sampling. 

Should we have a number of totally free sam- 
plings and compute an r ?1>23> >>n for each, these 
partial r’ s would vary in meaning and would vary 
in magnitude and their variance error could be 
computed, but, so far as the writer is aware, 
this has never been done. We must therefore 
look upon the partial correlation coefficient 
(and the multiple regression coefficient) as 
having a known variance error only when all 
variables, except the two primary ones in ques- 
tion, have fixed values. 

The variance of b lf as given by [12:34] is 
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its variance when all variables except X Q and X. 
have the fixed values of the observed sample, 
thus by parity with [11:79], 




*0. 12. . . n 


v 0 k 


2 

0 . 12 . • • n 


W. 


12 • . • ) I ( . , 


n (N-n-l) 


Variance error 

of a mu 1 1 i p I e[l 2 ! 34 ] 

reg ression 
coef f I cl ent 


The divergence of the observed value, b. , from 
any theoretical value, b. , is to be tested by 
the variance ratio 


(b. r b .) 2 V.k 2 


1 . 12 . 


) I ( . . . n 


(N-rr-l) 


1 ( N-n- 1) 


V k 2 

*0 K 0. 12. . . n 


[12:35] 


( See a! so [l2: 24-] , 


When, as is common in multiple regression prob- 
lems, (N-n- 1) is large, //"i ( N ~ n . i7 is sensibly 

the same as the x deviate in a unit normal dis- 
tribution, so that double the area to the right 
of x is a close approximation to the variance 
ratio P. 

The multiple correlation coefficient is a 
biased measure, being larger then is to be an- 
ticipated if the obtained regression is applied 
to a new sample and the correlation with a simi- 
lar criterion computed. The amount of this bias 
is known, (see Fisher, 1923, and Wherry, 1939), 
and the following formula applies, in which s r 2 
is r 2 corrected for shrinkage: 


^2 = -U r 0Al2. . , n n 

s r 0 Al 2 . . . n 

N-rt-1 

2 

Giving correction In T for shrinkage 
p rob I em, c f. [ll: 116] 


[12:36] 

I n the (n+ 1 ) variable 


[12:37] 
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The limiting* values of multiple r are 0 and 1, 
so the form of distribution is not identical 
with that of a total correlation coefficient, so 
we shall not use the r-into-z technique, of Chap- 
ter X. We may compute a variance ratio, similar 
to [10:42], to test the hypothesis that true mul- 
tiple r 2 is equal to 0, Corresponding to the 
variance equation, 

F 0 ~^oA 12. . . n + ^0. 12. . . n~^O r oAl2. .. n + ^O^ rr oAl2». . 

is the degree of freedom equation, 

(N-l) = n + (N-n-1) 

The multiple correlation coefficient has as many 
degrees of freedom as independent variables, n, 
and the error variance has A-n-l , so that, 

* 0612 ... ^ to test hypothe- 

^ntN-n-lj” " sis that [12:37] 

^ 1_r oAi2...nXn; r 0 AL2...n =0 

In addition to the relationships of this Sec- 
tion, further relationships are neatly expressed 
by means of matrices, as explained in Section 4. 

section 4. THE USE OF MATRICES IN EXPRESSING MULTIPLE 
CORRELATION RELATIONSHIPS 

The treatment herewith assumes the elementary 
familiarity with matrices and determinants rep- 
resented by the discussion of Chapter XIV, Sec- 
tion 2. The illustration herewith is based upon 
a four- vari able problem, but the relationships 
given hold for a problem with any number of var- 
iables. Were there a simple way to evaluate de- 
terminants of fairly high order, the solution of 
regression equations by means of matrices and 
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determinants would undoubtedly be the standard 
method, for it brings into clear focus the under- 
lying algebraic and geometric relationships. We 
start with an augmented matrix of type [12:38] . 
For condition equations [12:13] this is 


1 

r i 2 

r i 3 

r o 1 

r 1 2 

1 

T 23 

r 0 2 

r l 3 

r 23 

1 

r o 3 

r 0 1 

r 0 2 

r ° 3 

1 


Augmented matri x 


[12:38] 


The transpose o £ CL is CL' and its inverse is G* 1 . 
We will designate the elements entering into A 
as a... The co factor of a., is A.., in which i 
refers to the row and j to the column. When G, 
Cl', and CT 1 are treated as determinants they are 
designated A, A' f and A~ x . The elements of G~ 1 
(or of A" 1 ) are a J 1 (at the intersection of the 
j row and the i column). 


a 


j * 



EI ements of the 


inverse matri x 


G" 1 [12:39] 


When the matrix is of predictor variables only 
we use a notation similar to the preceding, sub- 
stituting R for 4. Thus, 


ft = 


r 1 2 r l3 


112 


13 


23 


r 23 ^ 


P red i cto r matrix 


[12:40] 


The meanings of ft', ft* 1 , R , R* , R~ 1 , r. . , the 
element in ft (or S), /?. . its co factor, r J 1 , 

the element in ft* 1 (or J?" 1 ), will be obvious. 


[12:47] 
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r J ‘ L± Elements of the inverse matrix ft -1 [12:41] 

R 

The vector matrix of regression coefficients 

is /?, 



[12:42] 


The vector matrix of correlations with the 
predictand is A,, 


A, = 


0 1 


0 2 


03 


and transpose — 


!r 0 1 r 0 2 r 0 3 


[12:43] 


The following relationships, which hold for 
any number of variables, will not be proven, but 
merely illustrated with the four- variable data 
of Section 2. 

- O-l Yielding the matrix of (5 regres- r 

sion coefficients 

A 


/3 = R' 


1 - r 


0A123 


= k 


0.123 £ 


"[12:44] 

[12:45] 


r 0Al23 P ^ 


[12:46] 


V 


a, 


i i .2 

*0.123 

N - 4 


Variance error of a /3 
multiple regression [12:47] 
coefficient,- 1 ! variables 
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For the general case we have 


v- 


r' ' £r 2 

r *0.12. 


iV-n-l 

g„ 

R 2 (N~n~ 1) 


Variance error of a /? mul- 
tiple regression coeffi- Flo. 401 
c i ent, - l n +1) variables L 1/ 4o J 
(See al so [l2: 3 H-J > 


[12:48 a] 


A derivation of this follows directly from 
the a l formula on page 600 of Fisher (1922). 


We also have 


b . 

I 


^0 V 


Variance error of a mul- 
tiple regression coeffj-[]_2*49] 

- [l2: 3 4-J J 


cient (see also 


The significance of the difference between 
two of the jS coefficients entering into a single 
regression equation is sometimes desired. These 
are not independent, so a covariance term is 
present. The general equation is: 


vr/?r/y 


(V ! +r J j-2r' fcg. 12 ...„ 
N - n - 1 


[12:50] 


Variance error of difference between /3 regression coeffi- 
cients,-(n+l) variables 


Further 


-A )2 

^ — • L - - 

l.N-n-l y{/S.-/3. ) 


Variance ratio to 

test the signifi- L 1 2 I 5 1 J 

cance of (/?. ~ jB. ) 


Inherent in thi s equation is the relationship 
r* J ' 


Mj /r" 7H 


[12:52] 


Since the criterion variable does not enter 
into the right-hand member we note that the cor- 
relation between regression coefficients is in- 


[12: 53] 
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dependent of the criterion, or predictand. 

Under conditions of sampling such that U, and 
V. do not vary from sample to sample, we have 

Correlation between regres- 

r b b = r R R sion coe '* :f ' cients [12:53] 

\ ] P\P] 

The data of the four- variable problem used in 
the modified Doolittle solution is now employed 
to illustrate the solution by matrices and de- 
terminants. 

Using the values given in the augmented cor- 
relation matrix A, Table XII D, we evaluate as a 
determinant and find that A = .714545. 

The Predictor Matrix R is 


(1.) 

(2) 

(3) 


1.000 

.404 

.231 

In which the e 1 e- 

.404 

1.000 

. 060 

ments are desig- 
nated r. . 

« J 

.231 

.060 

1.000 


Evaluating as a determinant we obtain R - .791022. 
Utilizing [12:28] weobtain r^ 123 = .0967, to be 

compared to the value .0966 obtained in Section 2. 

The cofactors of the elements in R are 2? t . , 
as recorded in the adjoint matrix herewith: 


.996400 

-. 390140 

-.206760 

-.390140 

.946639 

.033324 

-. 206760 

.033324 

.836784 


’■ Dividing each of these by R we obtain the el- 

| ements of r J 1 of the inverse matrix R ~ 1 herewi th: 

i 


i 
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1.259636 

-.49 3 210 

-. 261383 

TO 

1 

H 

11 

-.49 3 210 

1. 196729 

.042128 


-.261383 

.042128 

1.057852 


Matrix multiplication yields the identity matrix 
which is useful as a check upon the arithme- 
tic accuracy of the work. 



1.000000 

.000000 

.000001 

R R' 1 = 

.000000 

1.000000 

.000000 


.000001 

.000000 

1.000000 


The matrix of correlations with the criterion, 
*o * **■ s • 



r 0 1 


.274 

A, = 

r 0 2 

= 

.225 


r ° 3 


.134 


The /3 regression coefficients matrix is 



.199143 


All . 23 


fix 

/3 - R _1 ^= 

. 193770 

= 

^ 02.13 




.079612 


^ 03.12 




Since /S ? - — 1 - we utilize the a values of 

°T 


Table XII B and obtain 
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= .217134; b 2 =/3 2 —= . 119901; 
CT 2 


b, =/S 3 — = . 058661 
cr-, 


which check with the values obtained from the 
modified Doolittle solution, for three and four- 
figure cr values were employed. 

The r ' 1 1 values in R" 1 provide us with the 
necessary information for testing the signifi- 
cance of the difference between two /S' s, formula 
[12:51]. For example: 

(/3 1 - /3 2 ) = .0593 

and 


k 


2 

0.12 


3 


.714545 

.791022 


= .903319 


so that 


V(B r B 2 ) = 


[1. 259636+1. 196729 -2(-. 493210)1 (.903319) 
125 - 4 


- .0257 

or a 


cr 




= .16. 


Accordingly, the difference .0593 is not trust- 
worthy. 


CHAPTER XIII 


SUNDRY STATISTICAL ISSUES AND PROCEDURES 

section l. EVIDENCE OF PERIODICITY IN SHORT TIME SERIES 
(An abridgment of Kelley, 1943) 

Many time series are of short duration be- 
cause the recording of the event has, of neces- 
sity, proceeded for but a short time. Though it 
may be necessary that the student bide his time 
and wait for the years to pass in order to se- 
cure sufficient data, we should come to this 
conclusion only after having exhausted the pos- 
sibilities of small sample theory. A method 
which applies to short series and which enables 
a judgment of the hypothesis with fiducial 
limits is important in that, whatever the time 
span, the issue can be investigated and the hy- 
pothesis substantiated or rejected within the 
fiducial limits set by the investigator. 

Let the time variable be X ± and let X Q be the 
variable which we postulate to be a periodic 
function of X 1 . Before investigating periodici- 
ty we should first take out any linear, quadric, 
or other non-periodic trend that may exist. We 
pause to note that the inclusion of "quadric, or 
other non-periodic" is a broadening, but we be- 
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lieve justifiable estension, of the usual defin- 
ition of trend as that allowance which has to be 
made for the time such that the residuals shall 
be uncorrelated with the time. 

For the purposes of precise test it is advan- 
tageous to have the trend represented by a line 
that places linear restrictions only upon X Qf as 
do the parabolic regression constants of Chapter 
XI, Section 4. Consider the following series of 
estimates of X Q : 

Estimate of X 0 — M Q . . Regression [13:01] 

%0/sl a + b X 1 . . . . Linear regression [13:02] 

Xq/^i ^2 ~ d + ft X ^ + C X 2 Quadric, regres- [13:03] 

? ‘ ' s i o n 

^oAl, l 2 , “ a + /? X x +yX 2 + SX? gression [ 13 1 U4] 

V\e test the need of [13:02] rather than [13: Ul], 

that is, we test the hypothesis 6 = 0, as in 
Chapter XI, Section 4. Or again we test the 
need of [13:03] rather than [13:02], that is the 
hypothesis C = 0, etc. When no trend is removed 
[13:01] provides the estimate of X Q and the quan- 
tities related to X ± in which we seek periodici- 
ty are (X Q ~ These have A r ~l degrees of 

freedom. If a linear trend is removed by [13:02] 
the residual quantities, having Ar-2 degrees of 
freedom, in which we seek periodicity are 

Vl (= x e - W 

When a quadric trend is removed by [13:03] the 
residuals are X Q # x ± 2 and they have A r ~3 degrees 
of freedom. 

A similar procedure, equation [13:04],. could 
remove a cubic trend, but should be used with 
caution as a cubic is itself a close approxima- 
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tion to a complete cycle of a certain sine curve. 
The cubic curve of Chart XIII I, which cuts the 
quadric much as would a sine curve with a period 
of 16 years, is an illustration of this. 

CHART XIII I 
BIRTHS IN UNITED STATES 1920-1940 



1920 1924 1928 1932 1936 1940 


Using the data in the first two columns of 
Table XIII A, we find a quadric trend, (C of 
[13:03] t 0 ), and remove the same obtaining re- 
siduals X 0 >ljl 2 , which may be investigated for 
periodicity. We employ a periodogr am technique 
with two modifications. The usual periodogram 
is as shown in the dot line of Chart XIII II. 
The raw correlation ratio squared, rj 2 , is plotted 
for successively greater periods, T. The argu- 
ment is that in the neighborhood of the period 
yielding the largest 77 2 will be found the actual 
period of the data. For this argument to be 
preci se it is necessary that a probability state- 
ment be attached to the outcome. Thus in addi- 
tion to, or instead of, the 77 2 ordinate in the 
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periodogram we employ the ordinate (1 -P), the P 
being that from a variance ratio, the numerator 
variance being that of the means of the classes 
into which the data have been grouped when items 
of the same phase for the period T constitute a 
group, and the denominator variance is the cus- 
tomary residual, or error, variance, 

A briefer process, yielding essentially the 
same picture, as illustrated in Chart XIII II, is 
to use as the ordinate the unbiased correlation 
ratio squared, e 2 . There lies in 6 the addi- 
tional value that it gives a quantitative state- 
ment of the correlation between time, treated as 
a periodical variable of period T, and the mag- 
nitudes, being investigated. 


CHART XIII II 
PERIODOGRAM 



2 4 6 8 10 12 14 16 18 


PERIOD IN YEARS 

We start with N equally time-spaced observa- 
tions X Q . We find that a quadric trend exists 
by the method of Chapter XI, Section 4, and re- 
move it, giving N residuals, X Qml l2 , having N - 3 
degrees of freedom. / 



SUNDRY ISSUES 
TABLE XIII A 

BIRTHS IN THE UNITED STATES 1920-1940 
(World Almanac, 1943, p. 472) 
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*1 

V 

Aq 

USB 

YEAR 

5 t RTH S 

100 




1920 

1508874 

16219 

I 

171 4261 

16885 

2 

17749 II 

17514 

3 

1792646 

18105 

4 

1930614 

18658 

5 

1 878880 

19174 

6 

1856068 

19652 

7 

2137836 

20093 

8 

2233149 

20496 

9 

2169920 

20862 

1930 

2203953 

21190 

1 

2112760 

2.1480 

2 

2074042 

21733 

3 

2081232 

21949 

4 

2167636 

22127 

5 

2155105 

22267 

6 

2144790 

22370 

7 

2203337 

' 22435 

8 

2285962 

22452 

9 

22S5558 

i 22453 

1940 

23S039S 

i 22405 




PHASES FOR PERIODS 7 


6 
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TABLE XIII A 


BIRTHS IN THE UNITED STATES 1920-1940 
(World Almanac, 1944, p. 472) 























488 


SUNDRY ISSUES 


[13:05] 


Let the period under investigation be T of 
the equally spaced time intervals. Of necessity 
1 < T < (N~2 ) and the practical limits of utili- 
ty of T, fractional or integral, may be taken 
such that 2 < T < (N- 2), while if a zero trend, 
equation [13:01], is involved 2 < T < (N-l). 

If A is exactly divisible by T, there are T 
classes each having N/T measures, X 0 # 1 1 2 , of 
the same phase in it. If A is not exactly di- 
visible by T, we have a variable number of meas- 
ures from phase to phase and the number of classes 
will be some number which we call k , greater 
than A/T. Let the means of the X Q * x ± ? measures 
in these classes be M 0 a , M Qb ,..,M Qk , The vari- 
ance of these means, appropriately weighted with 
the number of measures in each class, is desig- 
nated V(M q k ). The mean for each class is inde- 
pendent of that for every other class except 
that the mean of the means is equal to zero, so 
that there are just (k-l) degrees of freedom. 
Letting i take all values from a to k , we can 
express the variable 2 1 2 as equal to the sum 
of two independent parts. We have the following 
relationships: 


L o . 1, 1 


2 - 


1 0 1 


+ X 


0 . 1,1 


[13:05] 


A “3 “ ( k \) +( A*~2 k) 0.0. F. equation C 13 1 06] 

Analysis of 

Y 0.1. ! 2 = *^W +l Vi,i 2 ,i ^13:07] 


F 


k-i. 


N - 2- 


■ u - riJ , , a1 

k w., /«***.> nUxHsww-v J 


Testing periods which are fractional intro- 
duce no complication if proper allowance is made 
for the decrease in A because certain X Q values 
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are not used. 

The columns of Table XIII A in order provide: 


Co 1 umn 1 
Column 2 
Column 3 
Column 4 

Column 5 


Columns 6, 


Year 

Number of births 

^■oAi, i 2 as by [13:03] 

X 0 ,i x 2 * residual deviations to be 
tested for periodicity 
Column 4 values opposite a are in 
the same phase and thus constitute 
one class. Values opposite h con- 
stitute the second class when the 
period being investigated is two 
years. 

7,..., 21 are similar for periods 
of 3, 4, . . . , 18 years. 


At the feet of columns 5, 6,... ,21 are re- 
corded, in order, 

rj 2 , F ( k-1J { N _ 2 _. k } > P corresponding, and e 2 . 


We first investigate trend. X Q = number of 
births per year. X 1 - date, and for convenience 
we let x x - ^-1930. We find: 

M 0 = 20 50140 , and V 0 = 46179 48 X 10" 

X oA i = 2050140 + 30929 x ± 
r 2 0 1 = .75956 


To test the significance of the deviation of 
30929 f rom zero we compute 

V 0 r 2 01 (N-2) 

Fl ' 19 " ( l ~ r h) ~ 60 ‘ 022 

which yields a very small P. We next fit a 
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quadric regression line, obtaining 


[13:09] 


x o 4 :l, i z = 2118995 + 30929 x x - 1877.9 x 2 x [13:09] 
r oAi,x 2 = * 84113 


1, 18 


^o r oAi,i 2 F o r oJ (A 3) 
'o ( ^-~ r oAi, i 2 ^ 


9.2419 


yielding P = .048. Having odds of 952 to 48 
that -1877.9 is not a chance deviation from zero 
we take [13:09] as the trend line and compute 
residuals 


X o . i, i 2 *o X oAi, i 2 

as recorded in column 4 of Table XIII A. 

We give herewith a cubic regression line, not 
as a part of the procedure of determining peri- 
odicity, but because it is interesting to compare* 
such a line with the evidence available as to 
periodicity. 

X oAi,i 2 ,i3 = 2118994 + 11149 x x 

- 1877.8 x 2 + 300.61 x\ 


r oAi, i 2 , i 3 = • 89917 


1, 17 


^ r oAi, i 2 , r oAi,i 2 ^^ 4) 

^ ■^•'" r oAi, i 2 , 


: 9. 7856 


yielding P “ .045. This establishes the non- 
chance nature of the coefficient 300.61. We will 
later note that we do not establish periodicity 
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with the certainty that we here establish the 
fitness of a cubic regression line. 

To test for the existence of a two-year period 
the residuals of column four are to be grouped 
into two classes, a and b as indicated in column 
5. The a class has 11 measures and yields a 
mean, M 0 a , and the 10 measures of class b yield 
a mean, M Q b . Weighting these means 11 and 10 re- 
spectively and computing their variance , we obtain 
V(M 02 ), the subscript 2 being k , the number of 
classes involved. Analyzing variance as indicated 
in [13:07] and employing [13:08], we have: f 1>17 
= .073, yielding P - .825, which, being > .5, 
yields no evidence whatever that a period of two 
years exists. Similar computations for T= 3, T~ 4, 
T - 18 have been made and are as recorded at 
the feet of the appropriate columns of the table. 
When T - 16, P is .152. Thus the odds are about 
5 to 1 that a period in the neighborhood of 16 
years exists. The F for this case is an 5 . 

We have but 3 degrees of freedom available in the 
error variance. Should a real period of 16 years 
exist it is not surprising that data covering but 
21 years is unable to establish it with satis- 
factory certainty. 

The advantages of employing (1 -P) in lieu of 
7} 2 , in plotting and interpreting a periodogram 
wherein the number of degrees of freedom is small 
is exemplified in Chart XIII II. 

To a degree the advantages of ( 1~P) are also 
inherent in e 2 , as is indicated by the periodo- 
gram curve based upon e 2 . The derivation of 
Chapter XI, Section 5, of e 2 gave the relation- 
ship 

e 2 - 1 - : (I-77 2 ) See [11:117] 

N-k 


but this was postulated upon using a zero order 1 
regression equation [13: 01] . When a linear trend 
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is taken out the relationship is 

ft'~2 

= 1 " (1 “ T7 2 ) . . . .[13:10] 


If a guadric trend is removed, we have 

N - 3 


• 2 = 1 


N-2-k 


(1 - v 2 


[13:11] 


and so forth for the removal of trends of higher 
parabolic order. The values of given in the 
table were computed by [13:11]. The median 6 2 
value of the table is .012 which differs but 
slightly from .000, the value to be expected if 
there is no period in the data. 

The periodogram analysis here di scussed may be 
thoughtofas terminating the problem onlyincase 
no period is discovered. If the analysis does 
establish oneormore periods, their further spec- 
ification could be accomplished by fitting, say, 
a Fourier series, by a method whi ch i s appropri ate 
when the time span of the data is not an exact 
multiple of the period, or periods. 


section 2. LEAD AND LAG IN TIME SERIES 

The attempt is frequently made to predict the 
fluctuations of one time series from those of 
another. Let us consider two situations: Except 
for the aberrations which operate like chance 
(a) time series A and time series B fluctuate 
concomitantly, and (b) time series A tends to lag 
behind time series B by a constant time interval . 

In the (a) situation the correlation between 
the two series may be of interest in a historical 
study and it may even be of interest in a predic- 
tion study if it is easy to secure the informa- 
tion for one of the series and difficult for the 
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other. Thus, if relatively easy to get steel 
prices neither lead nor lag behind hard to get 
heavy industry prices, it obviously may be of 
advantage to use the fi rst to predi ct the second. 

Knowledge of the correlation maintaining in a 
(b) situation is generally even more useful for 
the fluctuations of the leading variable may be 
used to predict the future fluctuations of the 
lagging variable, for example, rainfall in June 
might be most useful in predicting harvest in 
August. The three chief statistical issues of 
this problem are (1) by what amount of time does 
the predictand variable lag behind the predictor, 
(2) what is the correlation between them, and (3) 
what are the standard errors, or the fiducial 
limits, in the time lag measure and in the corre- 
lation measure. The third issue has, as yet, no 
adequate solution. A sequential analysis based 
upon correlated data or the employment of sub- 
samples seems called for, but these are beyond 
the scope of this text.* An analysis yielding a 
solution to the first two issues will be illus- 
trated in connection with the data of Table 
XIH B ‘ 

Let X Q be the predictand, X ± the predictor, 
and X 2 the time variable. There is probably some 
trend in the X Q series and the same or a different 
trend in the X x series. We can take out the 
trends involved by the method of the preceding 
Section and correlate the residual variables, 
pairing measures corresponding to the same moment 
of time and again pairing them with some time lag. 
This process might wholly or partially eliminate 
the measure of lag that concerns us. Suppose the 

trends are X Q ^ 2 = f(X 2 ) and = f(X 2 +A) in 

in which the function of X 2 in the first equa- 
i s i den ti cal with the function of X 2 +A in the 
second. Clearly A is exactly the measure of lag 
which we wish, but it has been removed in the 
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taking out of the trends and it will not again 
reveal itself by a study of the correlations, 
with various lags, between the residual measures 
X 0 2 an< 3 X 1-2 . There is promise in the study of 
lag in an approach which requires that and 

X i ^2 toe identical functions, — the first of X 2 
and the second of X 2 +4, i f we first express the 
predictand and the predictor variables in compar- 
able units. Comparability of measuring units is 
generally involved in any study of lag. 

An obvious way to attempt to secure compara- 
bility is to deal with standard scores, as used 
in [13:13]. This approach may be expected to be 
most serviceable in connection with variables for 
which we have no indubitable zero points. 

In the case of prices we have such a point and 
equal relative changes in prices are in general 
equally important and meaningful. In this case 
the equivalence of ratios method, [13:15], is 
fruitful. A still better procedure in this case 
of an indubitable zero point is to let X Q and X 2 
be the logarithms of the gross measures. Though 
this logarithmic transformation is appropriate 
to the n all foods” and the "beverages and choco- 
late” price indexes of Table XIII B, we will 
follow the more common practice of taking these 
price ratios as they stand for the X Q and X 1 
variables. 

We pose the question, — are "all foods" retail 
prices a bellweather for "beverages and chocolate" 
retail prices and, if so, by what period of time 
do the latter lag and what is the correlation 
between them? 

Pairing X Q and X 2 items as in Table XIII B is 
a pairing with zero lag and it provides a sample 
of 84 paired measures. Pairing February X Q with 
January X ±f March X Q with February' X lt etc. is 
the one month lag pairing and the number in the 
sample is 83. Pairing March X Q with January X lf 
etc. provides the 82 cases in the two month lag 


[ 13 : 12]-* LEAD AND LAG 495 

TABLE XIII B 
RETAIL PRICES, 1929-35 

U. S. Department of Labor, Bureau of Labor Statistics, 
Serial no* R 384, Date, 1938. 


51 large U. S. Cities: Average 1 923“ 15 - ICO. 

X 0 - "beverages and chocolate”, and X JL - w all foods. " 



1929 

1931 

1933 

1935 


Xp 


*0 


*0 

Xi 

x 0 

X i 

Jan. 

no . 7 

102.7 

90.2 

39.2 

71. 1 

52.5 

73.5 

77.4 

Feb. 

1 10.8 

102.3 

89.4 

86.0 

69.5 

60 . i 

73.3 

79.7 

Mar. 

1 10.9 

101.4 

87-3 

85.1 

68.5 

59.3 

72.3 

79.7 

Apr. 

1 1 1.0 

100.8 

34.0 

83.9 

58.4 

60.1 

71.4 

81.6 

May 

110.8 

102.4 

82.4 

82.6 

67.7 

62.5 

70.8 

81.4 

Jun. 

1 ! 0.5 

103.7 

82.0 

80.5 

57.3 

64.9 

70.4 

81.7 

Jul. 

110.6 

103.5 

81 . I 

80.7 

67.4 

71.0 

69.9 

79.9 

Aug. 

1 10.4 

108.1 

81 . 1 

80.9 

57.7 

72.0 

0 9 a 3 

79.6 

Sep. 

1 10.2 

108.0 

81.0 

80.6 

67.8 

72.0 

58.4 

30.0 

Oct. 

1 10 . ! 

1 07.6 

80.4 

79.9 

68.4 

71.2 

68.0 

80.2 

Nov. 

108.9 

105.7 

79.9 

78.2 

68.4 

70.3 

67.8 

31.0 

Dec. 

105.3 

105.7 

79.4 

75.2 

63.0 

69.7 

67.6 

82.2 



*0 ' 


X 0 

X, 

“ X " 





1930 

1932 

1934 



Jan. 

101 .3 

104.5 

78.4 

72.8 

68.7 

70.5 



Feb. 

99 . 1 

103.4 

77.4 

70.5 

59.6 

72.5 



Mar. 

97.9 

102.0 

76.9 

70.7 

70.6 

72.6 



Apr. 

97.2 

103.3 

76.0 

70.3 

71.4 

72.2 



May 

96 . 1 

102.6 

74.9 

58.5 

72 . 1 

73.0 



Jun. 

95.6 

101.2 

74.4 

67.6 

72.2 

73.2 



Jul. 

95.7 

97.5 

74.2 

68.3 

72 . 1 

73.5 



Aug. 

95.0 

96.6 

73.7 

67 . 1 

72.4 

75.2 



Sep. 

93.8 

98.3 

74.6 

66.7 

72.8 

76.3 



Oct. 

92.9 

97.8 

74.5 

65.3 

73 . 1 

75.8 



Nov. 

92.2 

95.2 

73.8 

65.6 

73.0 

75.2 



Dec. 

91.9 

9 2 . 1 

72.8 

64.7 

73.3 

74.6 



series, etc. The correl ations resulting from 
these different pairings yield a series whose 
maximum can be found by fitting a parabola, see 


[7:23], and the location of thi s maximum, i.e, , 
the desired period of lag, is given by [7:24]. 
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This process may be laborious so the following 
short-cuts are noted: The difference correla- 
tion formula, [10:52] shows that for V Q and V 1 
constant (and they are nearly constant in going 
from the sample of 84 to that of 83 to that of 
82, etc.) the correlation is directly dependent 
upon the variance of the (X Q -X 1 ) differences. 
Further, the easy to compute (see [6:68]) aver- 
age deviation of these differences may be sub- 
stituted for the variance of the differences to 
give an order of magnitude. 

Thus, first compute differences (X 0 ~X 1 ) for a 
succession of lags, then compute the average de- 
viation for each of these sets, stopping when a 
minimum is obviously passed. For each lag in 
the neighborhood of this minimum compute the 
three values, V Qt V 1 , V(X Q -X 1 ), employed in the 
correlation formula [10:52], and compute the re- 
quisite correlations. 

Performing these steps upon the data of Table 
XIII B yields the information given in Table 
XIII C. 


TABLE XIII C 

CONSTANTS FOUND FOR VARIOUS LAGS IN 
"BEVERAGES AND CHOCOLATE" 



| 0 1 ag 

1 mo, 1 ag 

2 mo. 1 ag 

3 mo. 1 ag 

[4 mo. 1 ag 

A : 

84 

83 

82 

SI 

80 

A.D.ofCX^) 

5.006 

4-. 759 

4.752 

4.805 

4.986 

Vi 

197.250 

199.634 

202.031 

204. 49 1 

206.950 

V 0 ' 

2 11.259 

203.619 

195.404 

186.962 

177.950 

V(X 0 -*x) 

36.897 

35.101 

33.571 

34.588 

36.017 

r 

.9 103 

.913 1 

.9156 

.9125 

.9092 


The quadric best fitting the fi rst four points 
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(0 lag, .9103; 1 mo. lag, . 9 13 1 ; 2 mo . lag, .9156; 
3 mo. lag, .9125) is given by [7:23]. It has a 
mode, as given by [7:21] and [7:24], at 1.808 
months, revealing a lag in the retail prices of 
"beverages and chocolate" upon the retail prices 
of "all foods" of 54 days. The correlation with 
this lag, as given by [7:25], is .915. This 
completes the computation. 

Serviceable standard errors could be gotten 
by repeating the previous process a number of 
times, using the data for other years, and ob- 
taining a distribution of lag measures and also 
a distribution of correlation coefficients. 

section 3. IMPOSED CONDITIONS 

Limits . A phenomenon common to the physical, 
biological and social sciences is that some 
magnitude approaches a limit as time, or age, 
increases. The typical growth curves of biology 
have an adult limiting value which they approach 
asymptotically as age increases. A chemical re- 
action may approach a state of equilibrium in a 
very similar manner. A bond price may gradually 
approach parity, or some less readily defined 
value determined by general economic conditions, 
as time passes. Not uncommonly the forgetting 
and the learning curves that the psychologist 
encounters are of the same type. In all these 
cases the anticipated statistical problem is 
first, to determine the nature of the complete 
growth curve from a study of complete or nearly 
complete antecedent phenomena, second, to fit a 
curve of the established type to data for which 
the time sequence is incomplete and third, by 
extrapolation secure an estimate of the limiting 
value. 

For example, let us say that a study of the 
height growth curves of a substantial number of 
human males who have been -followed from infancy 
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to adulthood reveals that the phenomena of in- 
crease in height follow a certain pattern, or 
type, and second that we have a record of growth 
in height of a particular boy. Then, third, we 
fit the type curve to the individual record and 
find the maximum, or adult value of the fitted 
curve. This we take as the estimate of the pro- 
spective adult height of the individual. There 
are a number of more or less difficult statisti- 
cal problems involved in each of the three steps; 
(1) the determination of the type curve; (2) the 
fitting of it to the individual record; and (3) 
not the determination of the adult value which 
is very simple but the determination of its 
standard error (or better of its entire distri- 
bution form) which is the essential measure of 
credibility to be attached to the limiting, or 
adult, value just determined. 

The algebraic solution of these problems is 
beyond the scope of this elementary work, but 
the use of graphic devices will serve as a good 
introduction to the sundry issues and hazards 
which are present. 

Subordination . In medical practice certain 
syndromes have been found to be characteristic 
of certain diseases. Thus if a doctor suspects 
disease a he looks for symptoms A f B, C, and D, 
whereas if he suspects disease /? he looks for 
symptoms B, E, F , and G. If the symptoms are 
qualitative, that is definitely either present 
or absent, very adequate contingency methods em- 
ploying punched or notched cards with machine or 
manual sorting may be employed. These have 
proven very serviceable in limiting and refining 
diagnoses. When the symptoms are quantitative, 
as for example would be hours of sleep, metabolic 
rate, evidence of lassitude, and especially when 
the symptoms are in part qualitative and in part 
quantitative, the determination o f an adequate 
procedure is much more involved. The problem is 
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common to many fields. For example the adapta- 
bility of crop to terrain is a complex function 
both of quantitative phenomena like rainfall and 
soil ingredients and of qualitative phenomena 
like presence or absence of a freezing tempera- 
ture, a cultivation practice, etc. Though the 
full handling of this question is beyond the 
scope of this text the approach to it is through 
the statistics of contingency, multiple correla- 
tion and factor analysis. 

section 4. COMPARABLE MEASURES * 

For two differently derived quantitative mea- 
sures to be comparable it is logically necessary 
that, except for a chance factor, they measure 
the same underlying phenomena. It frequently 
happens in connection with social phenomena that 
a first score is in large part a measure of the 
same thing as a second differently derived score, 
as, e. g. , are two intelligence test scores. Both 
tests are designated by their authors "intelli- 
gence" tests and both probably measure in large 
part the same underlying function, but in lesser 
part each measures a capacity found in the one 
but not in the other. Equating the scores of 
two such tests violates the logical requirements 
of equatable scores, but nevertheless it has 
been done frequently and with an outcome that 
has seemed serviceable. The matter of how simi- 
lar two functions should be before they are 
equated is a problem that we leave to economists, 
sociologists and psychologists concerned with 
particular issues. We here assume that except 
for chance factors in each of X x and X 2 the 
underlying function measures are identical. 

The equivalence of successive percentile meth- 
od: If scores x 1 show a monotonic increase with 

increase in the underlying function and similarly 
for scores X 0 and if the chance factors in each 

* i 

A more detailed discussion with illustrations is given in 

Truman L. Kelley, STATISTICAL METHOD t "19 23 1, Chapter VI. 
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are negligible, then clearly the same percentile 
scores on X 1 and X 2 are equivalent. A table can 
be drawn up giving in neighboring columns the 
successive percentiles, P' m01 t 0 2 ’ * * • 9 9 l° r 

and for X 2 and these pairs are equivalent scores. 
Or, in general 


4 


2 p P 


Equivalent percen- [13:12] 

tiles 


This method does not require that the form of 
distribution oftheX 1 scores be the same as that 
of the X 2 scores. In general the method is not 
serviceable when X ± and X 2 have quite different 
reliabilities; as, e. g. , would be the case if X x 
is an intelligence score on a 60 minute test and 
X 2 a score on a 10 minute test. 

The standard score method: This method de- 
rives from the practices of Francis Gal ton (who, 
however, used medians and quartile deviations) 
and it asserts that standard scores as defined 
in [8:23] of equal magnitude are equivalent. In 
addition to the requirement that the underlying 
functions are the same, this procedure requires 
that M 1 ‘- 0 = jf • that cr x =£= & 2 ; that the form of 
distribution of X 2 is the same as that of X 2 ; 
and that the relative chance errors in X 1 and X 2 
are equal, namely that o Q / cr = or / cr 2 . Thus 
X 1 = 0 = X 2 when 1 


X 1 -M 1 X 2 -M 2 


Equ i v al ent st an- 
dard scores 


[13:13] 


The estimated true standard score method: 
When the relative chance errors in X 1 and X 2 are 
unequal an improved standard score method based 
upon estimates of population statistics is to be 
preferred. In this instance it is assumed that 
the underlying functions are, except for chance, 
the same; and, if these underlying true measures 
are designated X^ and X r ; that =0= M r ; that 
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cr =0= a ; and that the forms of distribution of 

Ct) T 9 

X ^ and of X r are the same. Then X^ =C= X r when 



The sample estimates of the true statistics are 

~ ^i'i a cc “ ° r i v/ ^ r i>’ C - (X 1 - r i^ 

and similarly for the second variable, so that 
X 1 =C= X 2 when 


(X^M^SFl 

*1 


(x 2 -m 2 )vt~; 


Equ \ va-J ent 
estimated true 
standard scores 
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The ratio method: A common procedure in eco- 
nomics is to consider certain price indexes or 
ratios to be equivalent. Economic considerations 
generally justify the assumption of a common 
zero point, or that zero price for one commodity 
is equivalent to zero price for a second. If 
there is justification for equating another 
point, such as M 1 =C= M 2 , or D 1 =C= B 2 , in which 
15 1 and B 2 are some justified par, maturity val- 
ues, or the like, and if the distribution of X 1 
is of the same form as that of X 2 , then X 1 -v- X 2 
when 



E qu j va ! en t ratios 


section 5. QU0T1 ENTS 
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There are numberless situations where some 
quotient X/Y is given as the crucial terminal 
statistic. One of the sounder trends in the 
study of variability and of distribution is the 
employment of the variance ratio. In education 
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and psychology, intelligence quotients, accom- 
plishment quotients, etc. have been widely used. 
These procedures have very different merit be- 
cause the quotients have very different proper- 
ties. The variance ratio is a very special quo- 
tient in that the presumptive form of distribu- 
tion of the numerator is known, as is that of 
the denominator; the two variances are uncorre- 
lated; and the form of distribution of the quo- 
tient is known and available in tables. The 
general quotient, X/Y, is far less tractable in 
its algebraic handling than the variance ratio. 
The variance error of X/Y is generally unknown 
and is always difficult to compute, the diffi- 
culty in some cases being insurmountable. Of 
course the determination of the complete distri- 
bution of X/Y is still more difficult. 

The accompli shmeht- quo ti ent, which has been 
defined as an achi evement- age divided by a men- 
tal age, has been used by some psychologists and 
educators for years without attaching a standard 
error, or other measure of variability, to it. 
With such limitation the quotient is not a cred- 
itable statistic. Investigators should endeavor 
so to set their problems that it is not needed. 
Economists have been equally free in the use of 
quotients having unknown standard errors. 

The urge to employ a quotient resides in a 
number of facts such as that the quotient con- 
cept as applied to measures having no standard 
errors, such as the abstract numbers of mathema- 
tics, is well understood by everyone with ele- 
mentary school training; the quotient frequently 
eliminates from the picture an irrelevant unit 
of measure; the quotient may reveal a stability 
not otherwise apparent, as e. g. , does the intel- 
ligence quotient in a way and to a degree not 
immediately obvious in the mental age or gross 
intelligence test score. 

The real understanding of a quotient based 
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upon observational data involves the first step 
of understanding the meaning of the numerator 
and of the denominator separately and the meaning 
of what is meant by di vi sion, and a second step 
of appreciating the trustworthiness to be at- 
tached to numerator, to denominator, and finally 
to the quotient. For practically all quotients 
the first step is simple, but for only a few is 
this true of the second step. 

Consider two quotients (a) an accomplishment 
quotient which equals an achievement age divided 
by a mental age and (b) a variance ratio V amm /V a &m 
in which F is the variance of that part of 
achievement age which is independent of mental 
age and F a ^ m is the variance of that part of 
achievement age which is dependent upon mental 
age. These two quotients bear upon the same 
fundamental issue and answer the same fundamental 
question. For the first quotient the first step 
in understanding is simple, but the second is so 
involved that it is usually entirely neglected. 
For the second, the first step in understanding 
is somewhat more difficult, but the second quite 
simple in that one can avail himself of the 
known form of di stribution of the variance ratio. 

Should fundamental findings be reached by 
variance ratios, but reported if necessary on 
account of the level of understanding of the au- 
dience by such quotients as accomplishment quo- 
tients the net outcome mi edit be substantially 
accurate and informative. Failing this proce- 
dure, the reader of an article reporting quotients 
must have the idea that the writer is as little 
aware of the error as is the reader. 

The standard error of a quotient is infre- 
quently reported in the experimental literature, 
though a fairly simple formula giving it is 
available. To a first approximation (i . e. , an 
approximation involving terms of the order l/N , 
but neglecting those of order \/N 2 and higher) 
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the variance error of the ratio X/Y is 



cr a r „ 

x y x y 


X F 



V a r I an ce 
ro r of a 
quo t i en t 


[13:16] 


There are certain shortcomings in the use of 
this formula. X and Y must be positive. The 
formula only applies when the quantity o* y /F is 
sufficiently small that neglect of powers of 
this quantity beyond the first will constitute 
no material inaccuracy. The distribution entire 
of X/Y is generally non-normal so that the in- 
terpretation of the critical ratio of the quo- 
tient divided by its standard error by means of 
the probabilities given in a normal distribution 
will not be precise. Finally the correlation 
between X and Y is required and there may be 
difficulties in obtaining this. With all its 
limitations a far wider use of this formula to 
give the standard error of a quotient would be 
a great improvement over the common practice of 
entirely neglecting the matter. 

The quotient 

r !2 ^lA2 

Q = = 

r l 3 *1 A 3 

may be called a quotient of determination coef- 
ficients. It is indicative of the relative value 
of x 2 and x ^ as predictors of x x . Accordingly, 
a significant excess of this ratio over 1.00 in- 
dicates that x 2 is the better predictor. The 
variance of Q as derived by Arthur C. Hoffman 
(in an unpublished paper) is 


.4^ * _i 

' N r? + 


2r 23 ^ r l3 r 23 2*1/23 

+ + r^-3] [13:16a] 


12 


r i3 r i2 r i3 


‘12 


•13 


If N is, say, greater than 25, it would seem safe 

to interpret the critical ratio in terms of 

& 0 

the probabilities of the normal distribution. 
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SECTION 6 . THE HOST RELIABLE WEIGHTED AVERAGE 
OF INDEPENDENT MEASURES 

If the measures to be averaged, X ± , X 2 , X^ , 
have known variance errors, we may write each 
of them as equal to a true measure plus an error, 
thus: 


: A + e 


l » 


If these measures are averaged, weighting them w 1 , 
w 2 , w 3 , etc., we have: 


W 1 X „ W 2 Xy +W 2 X $ + - ■ • 


+ w 2 + , 


W 1 e i +w 2 e 2 +W 3 e 3 + - • • 

•f " 

W 1 + W 2 + w 3 +. . . 


[13:17] 


The variance error of M is equal to the variance 
of the second term of the right-hand member of 
[13:17], The various e* s are uncorrelated as 
they are error functions. Knowing the variances 
of these several e r s, which in harmony with 
earlier notation we designate V lm V 2 . y ^3 . g > 
etc., we desire so to determine the w* s that the 
variance error of M is minimal. We will first 
solve this problem for the case of two variables 
and will let 


P 





M = pX ± + qX 2 


- 


P 2V 1.»+ V 2 .y=lP 2 ( V l.„ + V 2.y)-2pV 2 ^ 


2.y 


V + V 

V 1. co K 2.y 


] +■ 


V V 

y l .03 V 2.y 

~y Tv 

/ 1. a > 2.7 


The [] term is a perfect square and is the only 
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term containing p, so the entire function takes 
its smallest value when the [] -- 0. Solving, 


P , Wj 
Q w 2 


1 

— . . . . [13:18] 

1 




2 


y 


The optimal weights are thus the inverse of the 
variance errors. 

If we have variables, k-1 of which are com- 
bined (using either optimal weights, or non-op- 
timal weights) and treated as a single variable, 
we can combine this with the k 9 th variable in 
the optimal manner, and the relative weight of 
the k 9 th variable, by [13:18], will be as the 
inverse of its variance error. We accordingly 
have for the optimal average of any number of 
variables: 





+ . . . 



+ V ' 




F 2. 7 ' r 3. 3 

Most reliable weighted average 


[13:19] 


The most reliable weighted average of measures 
whose errors are independent is obtained by 
making the weights equal to, or proportional to, 
the inverse of their variance errors . 


section 7. FITTING OF CURVES TO OBSERVATIONS 

Two broad categories are readily distinguish- 
able. They are the fitting curves to frequency 
distributions and the fitting of regression lines 
In the first case goodness-of- fit is attested by 
the degree of agreement between observed frequen- 
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cies in the classes employed and the theoretical 
frequencies, or frequencies in the case of the 
fitted curve, in the classes. In the second 
case goodness- of- fit is attested by the smallness 
of the quantitative deviations from the regres- 
sion line of the observed values. 

We shall here consider the fitting of fre- 
quency distributions by the Pearson method. An 
alternative procedure developed by Charlier (See 
Rietz, op. cit. , ch. VII) will not be expounded, 
but it has special advantages when it is desira- 
ble to express a distribution as a cumulative 
function of normal distributions. 

The Pearson system (Pearson 1894, 1895, 1901, 
1902 Cont. , 1914; Elderton 1927; Craig 1936; Rietz 
1924, Ch. VII by H. C. Carver) of unimodal, in- 
cluding anti-modal, frequency distributions is 
very comprehensive in that it covers the broad 
range of types shown on Charts XIII III and 
XIII IV. It employs two, three, or four para- 
meters only, in the definition of its curves, 
and these are determined by the method of mo- 
ments. This method establishes the equality of 
the moments of the fitted curve to those of the 
data up to the number needed, e.g., four,— mean, 
variance, , and , — in the case of a curve 
with four constants. 

Further, since these moments are linear func- 
tions of the frequencies in the classes, the ex- 
act number of degrees of freedom represented by 
the variance of the differences of the observed 
frequencies in the classes and the theoretical 
frequencies, or those given by the corresponding 
areas under the fitted curve, is ( k-par ) , — the 
number of classes employed minus the number of 
parameters. This enables a precise /{ 2 for a 
test of goodness-of-fit . 

The type of curve is a function of the mo- 
ments. The first moment , in all cases , merely 
locates the curve along the x-axis . The second 
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moment, in all cases, merely measures variabili- 
ty, or, we can say, merely reflects the units in 
which x has been measured. Thus the important 
differences in type depend upon the third and 
fourth moments. Accordingly, all the different 
Pearson types can be represented by regions in a 
plane with dimensions cr 5 ( = /Pearson's j5 1 = 

Carver' s a 2 ) and (x^/cr^ (= Pearson's f3 2 = Carver's 
) . Rhind (in a chart used by Pearson) has 
charted all regions, using as axes /3 1 and j3 2 
[7:12] and [7:03]. We herewith give a chart of 
this sort, Chart XIII III. 

Charts VII I, VII II, and VII III, in Chapter 
VII, use Pearson notation and illustrate the 
range of curves covered by different Pearson 
types. 

Two criteria used by Pearson are k 1 and k 2 
defined as follows in terms of J3 lf fi 2 , and Craig's 
8: 

38 (^+ 4 ) 

= 2/3 2 - 3/3 x “ 6 =— 7 —— [13: 20] 

2 ~ 8 


^(/? 2 +3) 2 _ /S 1 

2 4(4/3 2 -3^ 1 ) ( 2 / S 2 -3 / S 1 -6) " 4M2+S) 


Craig gives a chart of similar purpose having 
dimensions /3 ± and S. 


8 - CraJ q 1 s 8 

/ 3 2 + 3 


[13:22] 


This 8 is the same as Pearson's a (See 1914, Part 
I, p, lxi) . The use of Craig' s 8, i.e. , Pearson's 
a, leads to simplifications in equations and in 
charts. We also reproduce, with permission 
Craig* s Chart, -Chart XIII IV herewi th. 
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CHART XIII IV 

THE (a 2 , 8) CHART FOR THE PEARSON SYSTEM OF 
FREQUENCY CURVES 

(The subscript L refers to bell-shaped curves) 


On Chart XIII III are lines indicating where 
certain moments become infinite. This, of course, 
refers to the moments of the fitted curve only 
for any positive moment obtained from finite 
data cannot be infinite. The lines indicating 
infinite negative moments 
n 

n ~~ "Z a deviation from a boundary [13*23] 

W of the cu rve) 

also refer to the fitted curve, though an infin- 
ite negative moment is a possibility with finite 
data. The bearing of infinite positive and neg- 
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ative moments upon the stability of distributions 
has been discussed by the writer in Statistical 
Method , (1923). It is especially noteworthy that 
Type III is the only general type for which all 
positive moments are finite* and that Type V is 
the only one for which all negative moments are 
finite, and that the limiting distribution, rep- 
resented by the intersection of the Type III and 
Type V lines, namely the normal distribution, is 
the only one having the stability that attaches 
to finite positive and negative moments of all 
orders. 

The location on Craig's diagram of certain 
types and also certain properties of these types 
are given in Tables XIII D and XIII E, herewith. 



| 

TABLE XIII D 

| PEARSON TYPES INVOLVING THREE PARAMETERS 


TYPE 

Two category S = “*l Lower limit in 8 of possible distribu- 
tions. {/-shaped curves 


XI I 

I I I 

I 

; 

i 

i 

II and VII 
1 1 

VII 

VIII, IX, and 

VIII 


8 — .5 /- shaped 

8-0 Upper limit inS for which all positive 
moments < oo . A-shaped for /? 2 < 4. 

J- shaped for /S 1 > 4. 

S-.4 Upper 1 imit in 8 for which 

8-1 Upper limit in S for which ^ <°°. 

8-2 Upper limit in 8 for which ^ <°°. 

yS 1 = 0 All symmetrical distributions 

“!<§<-,, 5 {/-shaped; -= 5<8<0 A-shaped 
0 < 8 < 2 A-shaped. 

Xi ^(2+38)= (2+^8) 2 (2+8) [13:24] 

-i<8<-.5; transition between (/- and 7-shaped 
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TABLE XIII D 

(CONTINUED) 

Type 
IX 
XI 
V 


TABLE XIII E 

TYPES INVOLVING TWO PARAMETERS 


~.5<8<0; transition between J- and A-shaped 
0<8<2 ; transition between J- and A-shaped 

/3 X = 4S(2+S) [13:25] 


TYPE 


M 

A\=0, 

s=- 

1. 

Mendel ian distribution (equal 
frequencies in two discrete 
cl asses) . 

R 

4 = 0 , 

8=- 

.5. 

Rectangular distribution. 

N 

^=0, 

8 = 

0 . 

Normal distribution. 

X orE 

A = ^ 

8 = 

0 . 

Exponential d istr i but ion. 

L 

*32, 

8=- 


Straight line distribution. 

P 

/?! = 0 , 

8 =- 

1/3. 

Parabolic distribution. 


Let the general equation covering all the 
Pearson curve types be 

y = f(x) [13:26] 

for which the differential equation is 


dy a - x 


y dx b 0 +b 1 xtb 2 x 2 


[13:273 


Written in terms of and § (Craig's) and valid 
except when S = ~ . 5 thi s equation is 
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dy 

y dx 


20+28) 

2+S , a 3 8 

2 (1+28) + 2( 1+2 8) X + 2(1+28)* 


[13:27 a] 


This covers an extensive class of unimodal , or 
A-shaped, of /- shaped, and of antimodal , or l/- 
shaped curves. Some of these are unlimited in 
both directions, some limited in one direction, 
and some limited in both directions, depending 
upon the speci fic values of the parameters a, b Q , 
b 1 , b 2 . The unimodal requirement limits the nu- 
merator of the right hand member of the differ- 
ential equation to x to the first power only. 
The limitation of the denominator to x to the 
second power is arbitrary, but sufficiently gen- 
eral to encompass a very wide variety of distri- 
butions. 

The curve may be fitted by the method of mo- 
ments. This assures that the moments of the 
curve up to the requisite number agree with the 
moments of the data. In general it requires the 
first four moments, M , V, /jl^ , to determine 
the four parameters a, b Q , b lf b 2 , Having these, 
the integration of [13:27] yields the equation 
of distribution [13:26]. 

It simplifies matters to deal with standard 
scores, so following Carver (See Rietz, 1924, 
Ch. VII) and Craig (1936) we let 

_ 2 ( X~~M ) n n-th moment In terms of 
& n “ standard scores [13: 28] 

N C7 n 

The following recursion formula for moments 
holds: 

a n a +na n ^ 1 b 0 +{n+l)a n b 1 +(n+2)a n+1 b 2 + a n + 1 [13: 29 ] 

Setting n successively equal to 0, 1, 2, 3 leads 


514 


SUNDRY ISSUES 


[13:30] 


to four condition equations from which we deri 
the following, valid except when S = - .5: 


ve 


2(l+2S)a = [13:30] 

2(l+2S)b 0 = 2+S [13:31] 

2( 1+2 S)b 1 [13:32] 

2( 1+2S ) b 2 s [13:33] 

For the general case we note that 8 depends 

TV*’ ft “ d 3nd that a 3 depends P upon 

f 3 ’ , and M, so that the four parameters a, 6 , 

ft , 2 ^determined from the first four moments. 
Une further parameter y 0 is dependent upon N, the 
number of cases m the sample, or is arbitrary. 
Ibe general types are called four-parameter dis- 
tributions. In all cases a simple way to deter- 
mine y Q is to choose an arbitrary value, y' 
compute the frequencies in all classes with°a 
grouping as fine as the precision of the data 
justifies, sum these frequencies and multiply 
this sum by such a number A. that the product 

j ’ ° r tdlG ar bi trarily chosen value, and 
then determine y 0 from [13:34]. 

^0 ^ Yo The ordinate at the [13’ 34] 

To determine the Pearson-""" , r - 

given data one can lo^ ^ ^^P 6 cur 1 ! r e TT th T a T t . flts 
Chart XIII IV the--' “ ate on Chart ^ 1JL 111 » or 
note t h point given by his data and 
~~ ~ j type region, or, if near a line, the 
type line in which or near which the point lies. 
What constitutes "near a type line" is considered 
in probability terms in Pearson’ s Tables, Part I. 
Great simpli fi cation occurs whenever three rather 
than four parameters can be used, or when two 
rather than three or four suffice. 
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Fitting the most important two- parameter curves. 
In addition to the normal distribution, the equa- 
tion and fitting of which have already been given, 
these are M , R t P, L, and E. 

The Mendelian distribution: The equation of 

2 

thi s takes the non-useful form y=y 0 ( 1~— ) 

The origin is at the mean, the limits are -cr, 
and cr, and y Q —0* However one does not need 
the equation to investigate any problems of 
goodness-o f- fi t and the like which may arise. 

The rectangular distribution: When N~l; M~Q ; 
range from -a/ 3~ to cr /37 the equation is 


_L 

2 CT /3 


Unit rectangular distribution 


[13:35] 


The parabolic distribution: When N-l; M- 0; 
range from ~cr /5~ to cr /5", the equation is 


y 


20 cr 1 /5 


( 5V-x 2 ) 


Unit p a rabo I i c 
di st ri bution 


[13:363 


The straight Line distribution : When /V=l ; 
M=2 /2V; origin at the hsft end at which point 
y ~ 0 ; range from 0 to 3 V2 cr, the equation is 


y 


X 

9V 


Unit straight line distribution 


[13:37] 


The exponential distribution : When A 7 -l; M~cr) 
origin at the left end, at which point y-e; 
range from 0 to the equation is 
-x 

y “ e Unit exponential distribution [13:38] 


The fitting of the t h ree-paramete r distribu- 
tions to data is not difficult. 

The two-category distribution: §=-1. As 
with the Mendelian distribution the equation of 
the curve involves the indeterminate feature 
0 X oo. All distributions in which the frequen- 
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cies fall into two discrete categories are here 
represented. Letting the scores attaching to 
these categories be 0 and 1, and letting the 
proportions in them by q and p, then all the mo- 
ments are functions of p, as given in Table IX A, 
and the distribution is completely defined. An 
a priori value of p is necessary if any test of 
goodness-of- fit is involved. 

Type XII: 8=-. 5, When N=l; M= 0; a,=p 3 /a 3 -, 
range from -cr(v/ 3+^-0^ ) to cr (i/ 3+/?-^ +ct^ ) the equa- 
tion is 


y = y 0 


cr(/3+,5^+a 3 ) - 1 

crW 3+ / S 1 ~aj ) + x 


Type XH[13: 39] 


If y 0 is determined so that the total area - 1, 

1 

[13:40] 

0 rdi n ate at the point X = C 

To five decimal places the gamma function values 
may be gotten from Table XIV A. 

As a sample we give herewith the Type XII 
equation corresponding to the (S=-.5, /3 1 =: 1 ) 

point, when p. is positive, M-Q, a~l 9 N~l t and 
the range is from -1 to 3. 

Unit Type XII distribution 
when /3,=1 and is posi- 
tive J 


7T 


& 


This is a twi sted /-shaped curve, i. e. , one in 
which the low end of the J turns down a little. 

Type III : S~0. When N-l; origin at the mode; 

range from -a to the equation is 
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X -2X 

y -y 0 (1+— ) A_1 exp Type m [13:41] 

a CL^Cr 

in which X is a deviation from the mode, and 
Mo = M - . 5 a 3 cr [13:42] 

a 3 = -^- [13:43] 

0-3 

2 cr a, 

a » 3:441 

A = ~t~ [13:45] 

Pi 


y* 


(4-1) M-l)* -1 
a e A_1 F 4 ^ 


0 rd I n ate at 
the mode 


[13:46] 


in which the reciprocal of the [] term is given 
by [14:50], When - 0 this reduces to the nor- 
mal distribution and when /3 1 = 4 it becomes the 
exponential. Herewith is given as an example the 
Type III equation when /j.^ is positive, = 1, 
cr = 1, N - Mo - 0, the range then being from 
“1. 5 to oo. 

Uni t Type TTT 

y "? e ' 2X " 3(3+2X)3 = 0 [13:47] 


The distribution of variances from samples drawn 
from a normal population is of Type III. This 
type has been found to be widely descriptive of 
phenomena. Areas, ordinates and deri vati ves for 
Type III distributions have been tabled by Sal- 
vos a (1930). 
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Type II. ^ =0; -1 < 8 < 0 (i.e., 1 < y8 2 < 3). 
When N = 1; origin at the mean; range from -a to 
a, the equation is 

x 2 

y = y 0 (1 —) m Type XX distribution [13;48] 

<3 

M - Mo, or anti-mode = Mdn = 0 [13:49] 


_ 5/3 2 - 9 
" 2(3-/S 2 ) 


[13:50] 


3 ~ fi 2 


[13:51] 


Hot- 1.5)- 

7o = 

a /77 V(m + 1 ) 


Ordinate at the mean 


[13:52] 


When 8 = -1 (i.e. , /? 2 = 1), a Men deli an dis- 
tribution of equal frequencies in two classes 
results, — see Chart VII I, Type M. 

When S = -.5 (i.e., /? 2 < 1.8), m is negative 
and a (/-shaped curve results, — see Chart VII I, 
Type II -U. 

When 5 < 8 <0 (i.e., 1. 8 < /3 2 < 3 ) , m is 

positive and a platykurtic A-shaped curve re- 
sults, — Chart VII I, Type II- i • 

When 8 = “1/3 (i.e., 2fj ) the distribution 
is parabolic, — see Chart VII I, Type P. 

When 8 = 0 (i.e., f3 2 = 3), a normal distribu- 
tion results. 

Type VII . = 0; 0 < 8 < 2 (i.e., 3</3 2 <oo). 
When A = 1; origin at the mean; range from -oo to 
oo, the equation is 0 


y 0 'I +-t)' 

a 


Type TTTT d i s~ 
t r i but i on 


[13:53] 
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[13:54] 


M = Mo = Mdn = 0 

_ 5/3 2 - 9 

m — - 

2(/3 2 - 3) 

a 2 -HA. 

y0 2 “ 3 


[13:551 

[13:56] 


Vo 


* Ordinate at the mean 

a VtT~ F(m - 1. 5) 


[13:57] 


For illustration, see Chart VII I, curve A. 

Types VIII, IX, and XI. fi 1 = ( 2+3S ) = (2+48) 2 
(2+8) as given in [13:24]. In the equations for 
these three types occurs an exponent, m t which 
is found by solving the accompanying cubic, 


4) + m 2 (9/3 1 ~12 ) + 24^ + 1 6/3, = 0 [13:58] 

The solution may follow the steps indicated in 
Chapter XIV, Section 7. A plot of [13:58] shows 
that, in addition to an extraneous branch, there 
is a branch passing through the following points: 
/3 X = oo, m = -1; /3 ± = 0, m = 0; /? 1 =4, m = too; 
/?i = °o, m = -4. The region between the first 
two points yields Type VIII distributions and 
herein -1 < m < 0 ♦ Between the second and third 
point yields Type IX distributions and herein 
0 < m < oo. Between the third and fourth point 
yields Type XI distributions and herein ~co < m < 
“4. The distributions represented by special 
points upon this cubic are R, L, and E, as noted 
in Table XIII E. A convenient form of the equa- 
tion of the curve for Types VIII and IX is 


x - y 0 


(1 ) m 

a 


Types V I TT and U 


[13:59] 
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and for Type XI, 


[13:60] 





y = y Q . . . . TypeH . . . . [13:60] 

The separate and specific features of these three 
types will not be considered. 

Type FIJI. “1 < m < 0; 0 < < co; and also 

“1 < S < -.5. A limited range transition type 
between U- and /-shaped curves. The equation as 
written has N - 1, and the origin at the right 
boundary when p^ > 0 and at the left boundary 
when (jl^ < 0. When p 3 > 0 the range is from -a to 
0 and when p^ <0 it is from 0 to -a (-a being a 
positive magnitude). 



The sign of a is the 


same as that of 




[13:61] 


^0 


1+137 

± a 


Sign so chosen that > 0. 
Ordinate at boundary. 


[13:62] 


M 


- a 

2 + m 


[13:63] 


Type IX. 0 < m < co; 0 < /3 X < 4; and also 
“5 < S < 0. A limited range transition type be- 
tween J- and A-shaped curves. As written the 
origin is at the left boundary when p~ > 0 and at 
the right boundary when p^ < 0. Tne range is 
from 0 to -a when p^ > 0 and from -a to 0 when 

p 3 < 0 . 


+ Cr( 2+17?) 


The s i gn of a Is 
oppo si te that of p^ 


[13:61a] 


y 0 is given by [13:62] 


M is given by [13:63] 
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Type XI. ~oo < m < -4; 4 < /3 1 < oo; and also 
0 < S < 2. The curve is unlimited at one end and 
is a transition type between U- and A- shaped 
curves. When P 3 > 0 (and the scale of measurement 
can always be so chosen that this is true) the 
equation as written has N - 1, and the origin at 
0, which is the distance a to the left of the 
left boundary of the curve. The range is from a 
to oo. 



(When > 0) . . . [13:64] 


y 0 = -(l+m)a- (l+m) 


0 rd i n ate at X ”0, 
outside the range 


a point 


[13:65] 


a( l+/7?) 

2 + m 


Mean measured from the origin 


[13:66] 


The mean is -a/ (2+m) to the right of the left 
boundary a. 

Type V. /3 1 = 4S(2+S). A leptokurtic curve of 
limited range in one direction. When N = 1; 
origin at the left boundary when p 3 > 0; range 
from 0 to oo; the equation is 
“7 

y = y 0 X~ p e~*~ Type s [13:67] 


P 


4 + 


8+V/3 1 +4 


[13:68] 


y - ( p- 2 )cr V p- 3 


Sign of y to be the same 

as that of ^ [13:69] 


y.P~ 1 

y ° r(p-u 

m =— . 

P“2 


This constant Is not an ordinate 

at any real point [13:70] 


[13:71] 
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Mo 


2y 

p(p-2) 


[13:72] 


As a sample we write down the Type V equation 
when A-l; cr- 1; a^=1.5; origin = 0. Then p- 16; 
7=14/13; y 0 =(14/U) 15 / r 15; M=/ 13; *„=•£ /l3, 

; n a 8 


(14/13) 15 v 16 -14/13" 

y = X~ exp 

ns F x 


The four parameter Types I, IV, and VI . For 
the fitting of these we refer to original mem- 
oirs, or to Elderton, op, cit. 

Type I is given by points in the Craig Chart 
for which S<0. The general equation, with origin 
at the mode, is 

y = y 0 (l+— ). (1 ) 2 Type I [13:73] 

a i a 2 

Type VI is given by points in the region be- 
low the line S“Q and above the line /3 1 =4S(2+S). 
The general equation, with origin at a before 
the start of the curve, is 

y = y 0 (X-a) 1 X 2 Type H [13:74] 

Type IV is given by points in the region be- 
low the line ^=48(2+8) and to the right of the 

line ySj_~0. This is the only region in which the 
range is unlimited in both directions. The 
general equation, with origin at v a/ r beyond 
the mean i s 

X 2 -m .-1 X 

y - y 0 (l+ — ) exp [-vtan (— )] [13:75] 

a 


x 
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section 8. THE VARIANCE ERROR OF A STATISTIC DERIVED FROM 
ONE HAVING A KNOWN VARIANCE ERROR 

Let K be the statistic having the known vari- 
ance error V k and let D be the statistic derived 
from K 9 whose variance error, V A , is desired* 
In general cr k /K is a magnitude having a factor 

1//7T, 1/ /N-1 , or the like, so that, if N is at 
all appreciable the square and higher powers of 
cr k /K are negligible in comparison with cr k /K . In 
this case, but only in this case, the methods 
here given suffice. 

The binomial expansion method: Students with 
no knowledge of the calculus can use this method 
and it has a surprisingly wide field of applica- 
bility. Using the bar to indicate mean values 
when many samples are taken, and letting d be 
the error in the sample value of D and k__the er- 
ro,r in the sample value of K, we have D-D+d, and 
K~K+k, so that 


D+d = f(K + k) 

Expressing f(K+k) as a binomial with positive, 
negative, integral, or fractional exponents, ex- 
panding and keeping the terms of zero and first 
degree only in k 9 as may be done if higher de- 
gree terms than the first in (k/K) are negligible 
in comparison with the first, we obtain 

D + d = <p(K) + k W(K) 

the variables when many samples are taken are d 
and k } so we have 


v d = mK)i 2 v k 


The variance of a derived sta- 
tistic as a function of the 
variance of the given statis- [13:76] 
tic (binomial expansion method) 


To illustrate this method 1 et us derive the 
variance of the standard deviation knowing the 
variance of the variance. 
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— _ JL JL 

a + s =(V+v ) 2 - V 2 + -~ 

2V7 

Computing the variance of the right and left 
hand members we have 


Y r — — ~ Variance error of cr in r 

Cr case A is ample [ 13 : 77 ] 


Substitution of statistical deviations for 
calculus differentials method. The degree of 
accuracy in this procedure is the same as in 
that just given. Using the same notation as be- 
fore it is assumed that the calculus ratio — — i<? 

. „ dK 

essentially equal to the statistical ratio d/k. 
Given D=f(K) we obtain the derivative 

dD = f‘(K) dK 

and substitute for the differentials, writing 
d = f'(K) k, 
thence immediately 


V 


d 


lf'(K)] 2 V k 


The variance 
stati Stic Is 
v i at i on s fo r 
m etho d) 


of a de ri ved 
tati sti ca! de- 
d i f f e ren t i a I s 


[ 13 : 78 ] 


To illustrate m the case of V 

1 cr 

o- = VT 

the differential equation is 

dV 

do - l 

2Vz 


substituting statistical deviations, noting that 
cr = cr + s, and that V = V + v 
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s 


V 

2Vz 


so that, computing the variance of both members, 


V 


Or 



Variance error of cr In 
ca se /V Is amp ( e 


[13: 79] 


Expanding by a Taylor (or Maclaurin) series 
method. The degree of accuracy depends upon the 
number of terms of the series kept. When kept 
to the /' term the accuracy is the same as in 
the preceding methods. Using the same notation 
as before, 


D = D + d = f(K + k) 

= f(K)+ k f (K)+~f"(K)+ . . . 

Keeping the expansion to two terms and taking 
the variance of both members, 


V 


d 


[ f'(K )] 2 V k 


The variance of a de- r , 0 0 r 
rived statistic (Tay- L-** O U J 
l or* s serl es method) 


To illustrate in the case of V : 


— — jl 

cr - cr + s = (V+v )2 


1 1 
= V2+V ZT 

2V* 

and taking the variance of both members 



The variance error of cr 

in case N is ample [13:81]' 


Logarithmic differentials method. This is a 
special case of the substitution of statistical 
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deviations for differentials method, which is 
very convenient when the derived statistic is a 
function of products or quotients of other sta- 
tistics* Let 

D = K a L b M c [13:82] 

wherein the variances and covariances of the 
variables K, L , and M are known. Taking the 
logarithm of [13:82] we have 

log D - a log K + b log L + c log M [13:83] 
Taking logarithmic differentials 


dD 

~D 



dL dM 

+ b + c 

L M 


Substituting statistical deviations for differ- 
entials, squaring, summing, and dividing by the 
number of samples to get variances, yields 



'k I 


'k m 


+2ab + 2ac 

K L KM 


f 2 be- 


't m 

lY 


Vari ance of a 
de ri ved sta- 
tistic ( log- 
arithmic dif- 
ferent! a f $ 
m ethod) 


[13:84] 


An illustration of this method is given in Sec- 
tion 9. 

SECTION 9. THE VARIANCE ERROR OF A COEFFICIENT OF 
CORRELATION CORRECTED FOR ATTENUATION 

We shall employ logarithmic differentials in 
the computation of this variance error. The co- 
efficient of correlation corrected for attenua- 
tion ismost commonly derived from three observed 
values: (1) the correlation between the scores 

upon two tests; (2) the correlation between sim- 
ilar halves of the first test, i . e. , the half 


[13:86] 


ERROR OF ATTENUATION 


5 27 


test reliability, and (3) the half test relia- 
bility for the second test. We designate the 
variables involved as follows: 

are the similar halves of x ±f the 
first measure. 

an< ^ are the similar halves of x 2 , the 
second measure. 




X 


5 

er 


= x 3+ x 5 ; x 2 =x 4 + x 6 ; V =V^; W x and 
correlate approximately equally with oth- 
variables, as do also x^ and x^ . 


For convenience we employ such units that V, = 

V 1 - 3 

v i = 2+2r 35 , V 2 = 2+2 r 46 . With these units 
covariances are: c, , = r, 5 , c, 6 = r 46 , 

C 12 = 2v / I+r 3 " /l+r^' r 12 = c^+c lb +c^+c 3 6 = 
t ’ C 31 c ij l ^ + ' r 35’ C 42 -C 6 6 5 C 32 -C 5 2 

~ c m~ c 6 i~ 2°3 i+ _c i2/2 

In the formula for the coefficient corrected for 
attenuation 


12 


coy 


vr 


7^7 


the r x and r 2 are not observed values but Spear- 
man-Brown formula [11:10] stepped up values. 
Before developing a formula for its variance er- 
ror we must express r in terms of the observed 
items, thus 


12 


coy 


*2? 


35 


~2r 


[13:85] 


4- 6 


2 r 


1+r 

-_i 

2 


3 5 


1+r 


«+6 

_1 

>2 


coy 


12 


-_1 

2/ 


(r 33 ) ri+r 35 ; 2 ('r H6 ; 2 a+r lt6 ; 2 a3:86] 
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Taking logarithmic differentials of [13:86], 
substituting statistical deviations for differ- 
entials, and computing the variance, we obtain 

4V V V V V 

r ary _ 12 r 35 *6 3? 

^ + 4^ + 4r 2 , 6 + 4(l + r 35 / 

C r r c r r c r r 

r l2 r 35 12 r 35 r i2 r 46 

— -f — — — - 

4(l+r 46 J> 2 r 12 r 35 r 12 (l+r 3) ) r 12 r^ 6 

[13:87] 

c V c c 

r 12 r 46 r 35 r 3 5 f 46 ^ r 3 5 r 46 

r i2^ +r 46^ ^ r 35 r 46 2r^ 5 (l+r 46> ) 

c c 7 

r 35 r 46 r 3 5 r 46 S6 

2(l+r 35 ^ 6 2(l+r 35 Xl+r 46 ) 2r, t6 (l+r lf6 ; 



If we now substitute variances of correlation 
coefficients as given by [10:46] and covariances 
as given by [13:134] we secure, after some sim- 
plification 

V =-^(4r 2 +4+4- + T"— ~' 2) [13:88] * 

r coy -.2 _2 r i r r 

0)7 4(JV~2) 12 r 35 46 r 3 5 r 4 6 

Variance error of T when obtained via [13:85] or [l 3 :9l] 

By a similar process we find that if a cor- 
rection for attenuation in one variable only is 


B. Bablngton Smith has kindly checked t-hl s derivation. 
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r i r 


r 


12 



46 

46 


Coefficient of correlation 
corrected for attenuation in 
one variable only 


[13:89] 


The variance error is 


V 


r 


1 y 



N - 2 


2 

ly 




[13:90] 


Va r i an ce 


error of r, when obtained via 
ly 


[13:8 9] 


A way equally excellent to [13:85] for util- 
izing all the data to obtain r is by the use of 
Yule's formula* 


^ r 34 r 3 6 r 5 4 r 5 6 - 


ary 


y 


r 35 r 46 


,6. U. Yule's formula [X3*9l] 
for T asy 


The writer has proven that to the degree of pre- 
cision given by terms involving l/(A r -2) the vari- 
ance error of r ^ thus determined is the same as 
of determined by [13:85]. 


section 10. THE EQUI-PROBABLE AND THE MEAN RANGES FOR 
SAMPLES OF DIFFERENT SIZE DRAWN FROM A NORMAL POPULATION 


We let the scores, as deviations from the mean 
in terms of the population standard devi ation, be 
designated x. From Tippett's table (1925) (see 
also Pearson, 1932) we can find the probability 
that the value x will be the largest observation 
for a sample of N (Tippett's tables cover samples 
of /V = 3 , 5, 10, 20, 30, 50, 100, (100), 1000). 
Let us choose a small interval (the writer 


‘Stanley Smith Stevens, H. L. Long, and E. R. Stabler have 
kindly checked this derivation. 
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chose i = ,2a) and designate the probability of 
x in the interval ( x- . 5 i ) to (x+.5i) f y , as 
readily computed from Tippett’ s tables. Let us 
arbitrarily choose some range, ra. Let p x _ fa = 
the probability that a measure drawn at random 
from the normal population < (x~ra). Th\s prob- 
ability is immediately available from a table of 
normal probability functions. Let p y - the 
probability that a measure drawn at random < x. 

Then r a )/p* is the probability that in a 

sample whose largest measure is x a second measure 

> ( x~ra) f and the probability that the ( A— 1 ) 
measures other than the measure x > (x-ra) is 
[fp x “P x _ r a J/p x ] n ~ 1 * Thus the probability that 
the range in this sample of A : > ra is 

(l-CIp.-P,,,,)/?,]"- 1 } 

The frequency of occurrence of this situation is 
/ x . Finally we sum f X {1 - [( P„-p x _ rSt )/F y 1 for 

all successive values of x to obtain the proba- 
bility that the observed range will exceed ra. 
We can repeat this process for different values 
of ra until we experimentally find that value 
for which the probability ~ .5. 

To illustrate in the case of A ;=r 10. With ra = 
3.1a the computation described yielded p”. 46240, 
the probability that a range > 3. 1 a would be 
obtained from a sample of 10. Then ra = 3.00 
was investigated and the probability of a range 

> 3.0 a was found to be , 51228. Linear inter- 
polation between these values to find the range 
for which p = . 5 gives 3.0246 a, the equi-prob- 
able range as recorded in Table XIII F herewith. 
The mean range is a little greater than the equi- 
probable range. Its value, as given by Tippett 
is recorded in the last column. 
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TABLE XIII F 

THE EQU I— PROBABLE RANGE, AND THE MEAN RANGE, OF 
SAMPLES DRAWN FROM A NORMAL POPULATION 
(in terms of the population a ) 


N 

ECU I-PR0B ABLE RANG E 

MEAN RANGE 
(FROM TIPPETT! 

6 

2.46 

2.5344 

10 

3.0246 

3.0775 

25 

3.90 

3.9306 

50 

4.47 

4.4981 

100 

4.9683 

5.0152 

300 

5.70 

5.7555 

1000 

6.4379 

6.4829 

soooo 

7.67 


100000 

8.78 



section 11. THE OPTIMAL SIZE OF INTERVAL FOR H!S- 
| TOGRAM OR FREQUENCY POLYGON 

The temperature data, Table IV J, are not 
finely enough graduated to illustrate well the 
issues involved in connection with the interval 
for graphic portrayal, so we will deal with the 
heights (hypothetical) of 100 American male 
! white adults recorded to the nearest .01 of an 

» inch, Table XIII F. With such fine grouping, 

few if any classes will have a frequency greater 
than 1. A plot with this very fine grouping is 
shown in Chart XIII V. 

CHART XIII V 



50 


55 


SO 


65 


70 
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which, all will agree, is quite uninformative. 
If we group in intervals of 5 inches, it is 
as shown in Chart XIII VI, which again is not 

CHART XIII VI 



very informative. We clearly desire to group 
using some interval greater than .01 inch and 
less than 5 inches. I f we group using .5-inch 
intervals the curve of Chart XIII VII results. 

CHART XIII VII 



To the trained statistician, aware of the fluc- 
tuations to be expected from sampling, this 
might be quite satisfactory. However, graphic 
portrayal is essentially a popular device and is 
supposed to convey information to the statisti- 
cally initiated. If such a person sees Chart XIII 
VII he likely will interpret as significant all 
the small fluctuations from class to class. He 
certainly has no standard as to where to draw the 
line between significant and insignificant fluc- 
tuations. Thi s is in fact a dif fi cult line for 
the well-trained statistician to draw. The ori- 
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ginal 

measurements, we will say, are as 

given in 

Table 

XIII G. 

TABLE XIII 

G 


52. 63 

56.45 

57. 68 

58. 67 

59.84 

53.30 

56.59 

57.76 

58.73 

59.91 

53.47 

56.67 

57. 77 

58.74 

59.96 

53.91 

56. 68 

57.81 

58.81 

60. 13 

54.09 

56.77 

57.84 

58.87 

60.30 

54.35 

56.81 

57. 90 

58.97 

60. 37 

54.72 

56.86 

57.98 

58. 98 

60. 45 

54.78 

56.91 

58.00 

59.06 

60.67 

55.07 

56.95 

58.04 

59. 13 

60.81 

55. 17 

57.11 

58.09 

59. 19 

60.83 

55.43 

57. 13 

58.13 

5 9.26 

60. 93 

55.81 

57. 17 

58.20 

59.32 

61.09 

55.89 

57. 24 

58. 25 

59.39 

61.37 

55.96 

57.27 

58.27 

5 9.44 

61.59 

56.08 

57.36 

58.33 

59.50 

6 1.61 

56. II 

57. 43 

58. 36 

59.5 1 

61.75 

56. !9 

57.48 

58.41 

59.63 

62.41 

56. 23 

57.49 

58.5 1 

59.69 

62. 58 

56.34 

57.53 

58.55 

59.71 

62.87 

56.36 

57.63 

58.58 

59. 84 

63.03 

We 

therefore 

seek a rule 

for coarseness of 


grouping for graphic portrayal which will result 
in a frequency polygon which, probably, will not 
be misleading to the untutored reader who inter- 
prets the directions of the frequency fluctua- 
tions shown as meaningful. If the graph looks 
like that of Chart XIII VIII, we desire that the 

CHART XIII VIII 
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reader have at least an even chance of being 
right if he believes that in the population the 
frequency in class x 2 exceeds that in x ± and 
falls short of that in x^ . Of all the possible 
comparisons between class frequencies we may say 
that the edict that is greater than f 2 or f ^ 
is the one most likely to be made, and the one 
above all others that we should seek to make 
trustworthy. 

The foregoing is not qui te as refined a state- 
ment of the problem as is necessary because it 
does not cover a case that approximates that of 
Chart XIII IX. Here the naive judgment is that 
the mode lies between x 2 and x~ . We should hope 
that this judgment would be one having a 50:50 
or better chance of being right. 

CHAPT XIII IX 



We will now state specifically the problem set: 
It is to determine the coarseness of grouping to 
employ so that the observed mode (intermediate 
between two class indexes if the naive judgment so 
places it there) of a sample of N drawn from a 
normal population has a probabi lity of . 5 of lying 
in an interval having the true mode as its class 
index. If we can determine this interval, then 
we may conclude that the mode shown in a graphic 
representation using this interval, of a sample 
drawn from a normal population, will have a prob- 
ability of .5 of not being in error by more. than 
one-half the interval used, i.e., P.E. Mo = if 
the parent population is normal. 
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Except for the infrequent samplings yielding 
a greater frequency in a class removed two or 
more intervals from the true modal class than in 
the modal class, this statement is equivalent to 
the following: Given a normal population grouped 
in classes, with interval i for convenience ex- 
pressed in terms of cr as the unit, and with the 
true mode as the class index of one of the 
classes, then the desired interval i is such that 
the probability is .5 that, in a sample, f 2 is 
greater than and also greater than . 

If f 2 > f x then ( / 2 ~~ f i ) >0, and if f 2 > f? 
then > 0, and in a scatter diagram with 

axes and the relative frequency 

above the line / 2 -i c 1 = 0 is the probability that 
f 2 > f±, and the relative frequency to the right 
of the line f 2 ~/^ = 0 is the probability that f 2 
> f , , and the relative frequency in the upper 
right sector is the probability that f 2 > and 
also > 

Specifically employing an interval, i , let us 
be given three successive classes, with sample 
frequencies f ± , f 2 , and f ^ , such that the class 
index of the second class coincides with the 
parent normal population mode. Let ft - the num- 

ber in the sample. Let f ± , f 2 , and , in which 
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f = £, , be the population frequencies in these 

three classes. In the population f 2 is the 
largest, but because of the vicissitudes of sam- 
pling, f 2 may not be the largest in the sample. 
The greater the interval the more likely it is 
that f 2 will be the largest. We shall determine 
the interval, associated with a given N , for 
which the probability is .5 that f 2 is the larg- 
est. We shall consider this the desirable in- 
terval for graphic portrayal because then, ex- 
cept for the small chance that some > f 1 or 
f 2 or f 3 , we have at least an even chance that 
the sample mode will not differ by more than half 
an interval from the true mode. Since it would 
only be by rare chance that the class index of 
the second class would coincide with the popula- 
tion mode, we must consider the sample mode to be 

CHART XIII XI 
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that obtained by smoothing the frequencies in 
the four classes constituting the modal group 
for the sample, by some such process as [7:21]. 
and [7:24]. 

The correlation surface Chart XIII XI will be 
approximately normal if the f 9 s are not too small. 
Assuming that many samplings are made, the con- 
stants of this correlation situation are readily 
available. They are as given herewith, in which 
a circumflex over a symbol indicates a popula- 
tion value, and M, V, and r are the mean, vari- 

£ 7 

ance, and correlation. Also p =— and q = 1-p. 

N 


2 - f 1 *2 * 1 ~ ~ M f _ f _ 


+A^ 1 q 1 + avp i p 2 =V f _ f 


P 2 q 2 + 2p 1 p 2 - p\ 


< f 2 - f l' 1 V f ,» 

J S\j *\j I\j 


P 2 9 2 + P& + 2 p x p 2 


In terms of the standard deviation, the point of 
dichotomy at which f 2 ~f 1 = 0 is M f _ f /cr f _ f 

below the mean of the distribution. Similarly 
for the point in the other variable at which 

^2 1 1 is now a simple matter to determine 

1 by interpolating between the tables by Lee et 
alia (Pearson, 1914) giving the proportionate 
frequencies in upper right quadrangles for dif- 
ferent dichotomies and different correlations. 
This has been done for certain values of N as 
repo rted in Table H herewith. 
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TABLE XIII H 

THE SIZE (in cr units) OF THE INTERVAL GIVING A PROBA- 
BILITY OF .5 THAT THE FREQUENCY OF THE MEDIAN INTERVAL 
SHALL EXCEED THAT OF EITHER NEIGHBORING INTERVAL, IN 
THE CASE OF A NORMAL DISTRIBUTION; THE EXPECTED RANGE 
(in cr units); AND THE NUMBER OF CLASSES NECESSARY TO 
COVER THIS RANGE 


A 

i 

GIVING 

P = .5 

Ra 

EXPECTED 

RANGE 

NO. OF CLASSES 

= Ra/i 

6 

.90562 

2.46 

2.72 

10 

.81215 

3.0246 

3.724 

25 

.67082 

3.90 

5.81 

50 

.58140 

4.47 

7.S9 

SOO 

.50402 

4.9683 

9.857 

300 

.40347 

5.70 

14.13 

1000 

.31226 

6.43 79 

20.62 

10000 

.19490 

7.67 

39.4 

100000 

.12499 

8 .78 

70. 


The equi-probable ranges given in the third column 
are those derived in the immediately preceding 
Section 10. The values in the fourth column, 
rounded off to the nearest integer, give the 
practical answer that we have sought, the num- 
ber of classes to employ in the graphic portrayal 
of samples of different sizes. The information 
of the last column is repeated in Chapter IV, 
Table IV M, in a form which is simpler to use. 

section 12. DIRECT AND INVERSE INTERPOLATION 

There are many methods of interpolation where 
intervals are equal. A time saving method for 
direct interpolation is by means of tabled La- 
grangian interpolation coefficients as exemplified 
in Tables XV A and XV B. 

There are various methods available when two- 
way interpolation is called for. One servi ceable 
method is by successive one-way interpolations. 
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Satisfactory direct interpolation when inter- 
vals are unequal is more complicated. For one 
treatment of this matter, see E. T. Whittaker and 
G. Robinson, The Calculus of Observations , 19 24, 
Ch. II. 

Inverse interpolation occurs when the tabled 
values (unequal intervals) are used for purposes 
of entry and the answer sought is in terms of 
the variable (equally spaced) which is usually 
the argument. Linear inverse interpolation is 
simple, but if insufficiently accurate quadric 
inverse interpolation as later described may he 
used. 

a is the argument for which a value t is de- 
si red or in the inverse problem t is the argument 
for which a is desired. a Q <a<a 1 » The interval 
in the equally spaced arguments is i. 

i = a 1 - a 0 [13:92] 

The fractional distance of the a Q to a 2 interval 
to reach a is p. 

a “ a o a “ a o 

P — — also q - l-v. [13:93] 

a i~ a o 1 

For inverse interpolation 

t ~ t o _ t ^ t o , . r n 

pi _ — - - also q * - l~p'. [13:94] 

A) 

When it is necessary to distinguish between a 1 s 
and t* s for different s they are designated a n 
and t . When it is necessary to distinguish be- 
tween V s gotten by two-, three-, four-point, 
etc. interpolation they are designaced t ] 1 , t s 11 , 
£ fv , etc. When it is necessary to distinguish 
between a’ s gotten by two-, three-, and four- 
point inverse interpolation they are designated 
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We define symbols as in Table XIII I and as 
in the following defining equations. 

TABLE XIII I 


NOTATION FOR ARGUMENTS, TABLED ENTRIES AND DIFFERENCES. 


argu- 

ments 

TABLED 

ENTRIES 

1ST 1 
ORDER] 

2ND 

ORDER 

D 1 FFE 
3RD 

0 RDER 

RENCES 

4-TH 

ORDER 

5TH 

ORDER 

6 th 

ORDER 

a -2 

A-2 

■ 

■ 







Af 2 




• 


a -l 

A.i 

■ 

a3 







1 






a o 

*0 

■ 



All 





A o 

| 



A? 2 


a l 



Ac* 




A!i 



% 






a 2 


■ 








% ■ 

' J 


A* 11 

i 



a 3 


■ 









fl 





a * 

B 








In connection witn inverse interpolation we 
require A and 8 following: 
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[13:95] 


-A 1 ^ 1 

S = — + 3A 2 [13:96] 

A? 


We shall judge of the accuracy of interpola- 
tion by a parabola of a certain degree by com- 
paring the result obtained thereby with that ob- 
tained by using a parabola of the next higher 
degree. This comparison generally leads to a 
slight, but only slight, overstatement of the 
error. 

The following formulas give successively 
closer approximations to the correct value t . 


f * * = ( l_ n \ f + nt Two-point, or linearly 

p V. PJ o * 1 interpolated value 


[13:97] 


‘l" =p(i-p)(W[lt, 


2(2 -p) t,] 


Three-point, or 
quadri c value 


[13:98] 


If p < .5 interpolate in the other direction 
using t 19 t 0 , and L . 


=|p(l-p 2 )(2-p)[-~-t_ 1 +-yt 0 

3 1 ^ Four-point, or 

+ f “ t 0 ] cubic value 

1-p i 2 -p 


[13:99] 


Formulas [13:98] and [13:99] are of the forms 



542 


SUNDRY ISSUES 


[ 13 : 100 ] 


t' " = c 0 1 0 + c 1 t 1 - Cn 1 2 [13:98a] 

t’ v = + c 0 t 0 + c 1 t 1 -c 2 t 2 [13:99 a] 


These c- coefficients are known as Lagrangi an in- 
terpolation coefficients. Tables giving them 
for interpolation up to eleven-point and for 
different values of p are available. See Chapter 
XV, Tables XV A and XV B and references given in 
Chapter XV, Section 1, 


A good approximation to the error in 


( t* 1 ' - 

v “p 




); 


in t' 


it is ( t' 


- t' " 
P 



is 

in 


t' v it is (t p - t’ v ) . These errors are maximal 
when p = .5. In this case we designate them E ! ! , 
£ iU , and E' \ 


E'' 



+ 2 t x ~ t 2 ) 


M a x i m a I 
I I n ea r 


e r ro r in 
i n t e rpo i at i on 


[13:100] 


E' ] £_i "* 3t 0 + 


[13:101] 


Maximal error in three-point interpolation 


£lV = i28 (t -2 + 6t o " + [13:102] 

Maximal error in four-point interpolation 

For the Table XV C of the normal probability 
functions given in Chapter XV, Section 3, the 
E 1 1 and E 1 1 1 values recorded at the bottom of 
columns give these maximal linear and quadric 
interpolation errors as determined from values 
approximately halfway down the columns. 

For inverse interpolation , i . e. , the procedure 
for obtaining a knowing t f the following formulas 
hold: 
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■i : - 


3 g + ip 1 Linear inverse interpolation [13:103] 


■? i i . , , A p'q' 

= + i(p' + — — ) 


2 + 3A + A" 


or approximately 


a"" 1 = a 0 + i[p' + .25 pVA(2-3A)] [13:104] 

Inverse quadric interpolation 

Defining the error in inverse two-point in- 
terpolation as the difference between the three- 
point and two-point inverse interpolation an- 
swers, and designating the maximum value of this 
error £“ 1 1 , we have 

£~ n = . 125 i i A| 

Similarly, 


M ax I mum e r ro r in 
inverse linear 
interpol atien 


[13:105] 


£" m = .0625 i |sj 


Maximum e rro r i n 
inverse quadric 
i n te rpo I at ion 


[13:106] 


section 13. THE MACHINE EXTRACTION OF SQUARE AND CUBE ROOTS 


A number of excellent methods are available, 
so the only question is that of expedition of 
process. 

The extraction of square root by the following 
procedure is very rapid: We assume a computing 
machine with no trick devices. It has a key- 
board, a product dial or register, and a rota- 
tion counter dial or register, which latter is 
sometimes called the multiplier, or quotient 
dial. To illustrate the process let us desire 
the square root of 123. 456. 

Set 123.456 in the product dial. 

Set 11. (an inspectional estimate of the an- 
swer) in the keyboard. 

Divide, keeping the quotient to 4 figures 
(twice the number in the trial root) obtaining 
11 . 22 . 
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Make mental note of 11.11, the average of 11. 
and 11 . 22. 

Clear the rotation counter by multiplying by 
11.22, thus restoring 123.456 to the product 
dial. 

Change the keyboard to 11.11 and divide ob- 
taining 11.112151. 

The averageofthisandll.il, namely 11.11076, 
is the desired root correct, to eight figures, 
which is twice as many figures as the number 
which is the same in the second approximation 
(11.11) and the quotient (11.112151). 

Let us extract the cube root of 123.456. If 
a slide rule or brief table of cube roots. Chap- 
ter XV, Table D, can be used to get a first es- 
timate the process will be shortened by one or 
two iterations. The slide rule gives 4.98. 

Set 4.98 in the keyboard and square. The 
product dial reading is 24.8004, but this does 
not need to be remembered or recorded. 

The carriage is returned so that a rotation 
will register in the unit’s place. 

The keyboard is cleared. 

The multiplication bar is depressed once, not 
changing the product reading, but increasing the 
4.98 by 1 to 5.98. 

The keyboard is set to be identical with the 
product reading. 

The subtraction is made changing the counter 
dial to 4.98 and clearing the product dial, 
Glance at the product dial to see that this is 
so. 

Set the "division" lever. 

Multiply by such a number, namely 4.98, as 
clears the rotation counter dial. The product 
dial now bolds the cube of 4.98. 

If this cube is greater than 123.456 set the 
"division" lever, or button, and if less set the 
"multiplication" lever. In the present problem 
this cute = 123.50599, so something must be sub- 


OCCASIONAL FORMULAS 


[13:108] 


545 


tracted from it. We accordingly set the "divi- 
sion" lever. 

Make sufficient addition or subtractions (in 
this case subtractions) to make the product dial 
reading - 123.456. When this is accomplished we 
find -.002016 (minus because the "division" lever 
is operating) appearing in the counter dial. 

One-third of this, namely -.00672, is the 
correction to 4.98. Accordingly the answer is 
4.97 9 3 28. This is correct to twice as many fig- 
ures as the trial root was correct, i . e. , it is 
correct to the last figure. Otherwise stated 
4.9793280 differs from 4.980 in the fourth sig- 
nificant figure so 4.9793280 is in error in the 
eighth significant figure. 

If the value at this point is not accurate to 
enough decimal places repeat the process using 
the improved estimate of the cube root. Usually 
two such computations will suffice to give eight 
figure accuracy. 

section 14 . OCCASIONAL FORMULAS 

Since the proportion, p, of cases in a class 
is the frequency, /, in that class divided by N 
and since, under conditions of sampling in which 
N is made constant from sample to sample, the 
distributional characteristics of p are simply, 
those of f divided by the constant A, we can im- 
mediately derive statistics of p from those of 
f } as given in Chapter IX, formulas [9:02], 
[9:03], and [9:04]. We have Np-f; N 2 p 2 -f 2 ; etc. 
so that 



ariance of a p ropo rt i on [ 13 : 107 ] 


u 3 (p) 


pq( q~p) 

N 2 


Third moment of a proportion 


[13:108] 
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UJp) 


pq[l+^(N-2)pq] 

/V 3 


[13:109] 

[13:109] 


Occasionally one needs the regression, the 
correlation, or the covariance, between the fre- 
quencies, f # , f b , or the proportions, P 3 > P b , in 
two classes. Having the variances of the fre- 
quencies t ferula [9:02], both regression and 
correlation become available shell the covariance 
is known. We here den ve the covariance, df f b J, 
between the frequencies in two classes by first 
deriving the regression of one of these frequen- 
cies upon the other. . . - 

Let all classes except the two in question be 

considered as a single class c. The sample fre- 
quencies are »<* that 

The true, or population frequencies are f a , f b . 
f 6 , such that f. + ffc+V*'- Letting A' s represent 
deviation in frequency from true values we write 

f a =7.+ A a and f b =VV ^ en is SOnie P° sitive 


number of cases there must be an exactly compen- 
sating deficiency in the other two classes and 
in the long run the deficiency that attaches to 
the a class will bear the ratio to the deficiency 
that attaches to the c class that f a bears to f c . 
That is, the regression of A, upon A b is given 
by the equation ~ 

~ f — — A. 


= 


f+f. 


This is the regression of f a upon f b and after 
fnaking simple substitutions the regression coe 
ficient can be written 




"Pa •_ "Pa 




Regression of one celt r-j 9.116)1 
frequency upon a second 
non-overlapping frequency 


[13: 114] 
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From this, utili zing the appropri ate variances 
and standard deviations, we immediately obtain 


r ( / a / b) =r (P 3 P b ) 



[13:111] 


Correlation between non-overlapping ceil frequencies, or 
p ropo rtion s 


c(/ a f D ) --A'p a p b 


Covariance between non- 
overlapping cell fre- 
qu en ci es 


[13:112] 


c (p a P b > 


-p a Pb 

N 


Covariance between non- 
overlapping cell p ropo r^- 
tl on s 


[13:113] 


The preceding results may be applied to the 
frequencies, or the proportions, in a two-dimen- 
sional contingency table* In case one of the 
frequencies is a part of the second, this second 
can be split into two parts, one correlating 
perfectly with the first frequency and the other 
part having the correlation [13:111]. Making 
allowance for this we can obtain the covariances 
for all the type situations that maintain in a 
two dimensional contingency table. From these 
covariances one can readily compute the covariances 
between proportions and also the correlations 
between frequencies and the correlations between 
proportions. We designate the marginal totals 
for the rows / a , f b , etc., those for the columns 
/ a / , / b / , etc., and those for the cell lying at 
the intersection of the a row and the a' column 
f a a / , and similarly for cells at other intersec- 
tions. It is simple to establish that 

Th I s being [13 : 112] I n 

^^aa'^b'^^Paa'Pbb' new notation 


Covarl ance between 
the frequency In a 
row and the frequen- 
cy In a cel I In that 


[13:114] 

ro w 


~ N P aa'^a 
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. Covariance between the 
c(f f i)~N(p i~P a P a ') frequency in a row and 
a a the frequency in a column 

If the first variable is quantitative, having 
a mean, M ± , and a deviation of row a from the 
mean, x a? then, as established by Pearson (1913) 


<*»i f„0 7 


n iX 

^ aa a 


Covariance between a 
mean an d a cell fre- 
quency in a scatter 


[13: 116] 

di ag ram 


If both variables are quantitative the true 
regression equation of X 1 upon X 2 is 

'"Vu r*\^ S\j 

x 1 = b i 2 x 2 + M 1 -b 12 M 2 

This equation holds for every observed X 2 value* 
Thus for individual i in a sample of A we have 







If such equations are written down for all the A 
cases in the sample, added and divided by A, we 
have, by noting that IX 2 ~NM 2 and XJ^-A M 

# 1 = b 12 M 2 + 


This establishes that the parameters in the re- 
gression of M x upon M 2 are the same as those in 
the regression of upon X 2 » Thus the regres- 
sion of weans is the same as the regression of 
variables , and it follows that the correlation 
between means is the same as that between vari- 
ables when samples are randomly drawn from a 
single parent population , The best approxima- 
tion to these true parameters are the sample 
values , yiel ding 

b(M x M 2 ) = b 12 [13:117] 


r(M x M 2 ) = r 12 [13:118] 

C(M.M 2 ) C 12 [13:119] 

A 


mm . »«! 
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If in an experimental situation, such e. g. , as a 
sampling of public opinion, both r{M ± M 2 ) and r 12 
can be independently calculated and when so cal- 
culated are found to be different, it is estab- 
lished that the successive samples have not been 
random samples from the same parent population. 

We here compute a regression, correlation, and 
covariance between two variances. We consider 

the population distribution and take M ± and M 2 
as the arbitrary origins. We note that the mean 
x* for an array = 2 * Since x x ^ 2 "k l2 x 2 


this yields the following regression of x^ upon 


y2 . 

x 2 . 


'•V' o _ 

x i - 


1 2 ^2 , y 

°12 X 2 T V 


1 . 2 


Summing for a sample of A, and dividing by A, we 
o btain 

L = %* 1^2 + ?i. 2 [13:1203 


in which 1 V 2 -( )/A for the sample of A cases. 
It is an unbiased estimate of V 2 , which fact we 

r\, _ 

may express thus: ^ 2 ^ 2 +®* in which 0 is an un- 
biased estimate of 0, i.e. , the mean of these 
for many samples approaches 0. Thus from [13:120] 
we get 


b{V 1 V 2 ) 




Regression of 
vari ances 


[13:121] 




12 


Correlation between 
vari an ces 


[13:122] 



v 2 


_ 'V, O 

~ r i2 




Covariance between 
vari ances 


[13:123] 
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For the best value, of r\ 2 based upon sample data 
we may use [11:114]. For the best sample esti- 
mates of V x and V 2 we may use [6:09], and for 


N-l Z 

V and V u we note that V = V, and that 

7 1 7 2 N 


N-l 

V(—\ V) 
N 


= <hl2 

N 2 


so multiplying the right hand member of [6:55] 


by 


A" 


(N-l)‘ 


will yield the desired estimates of 


the true variances needed in [13:123]. 

Further, i f we deal with situations in which 
N is large and in which we may assume that the 
parent bivariate population is normal, the pre- 
ceding formulas simplify, yielding: 


= 

A r large 



and bivariate normal 


popu I at ion 


[13:124] 


r(V 1 V 2 )= r i 2 


N large and bivariate normal [13: 125] 

popu I at I on 


<W = yc 2 12 


N large and bivariate norma t[13: 1263 

population* SEE [ 13:14-8] 


Further variance, correlation, and covariance 
formulas have been derived by Karl Pearson (1913) 
and are here given: 

We employ the P, . and the p, . notation as 
given in Chapter XI, Section 4, That is P 1Q ~ M 1 ; 
P 01 ~M 2 ; P 11 = c 12 ; P 20 -V 1 5 P 02 -F 2 ; etc. The vari- 
ance error of any product moment, when N is large, 


that is, terms of’ order 1/N 2 , 1/N^ etc., 

neglected, is given by 


are 


[13: 128] 
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AFOijO -P 2T , 2j -pfj + l 2 P 20 pf.x f j 


+ j 2 Po zPf, j - 1 + 2 i i Pup,. !, j Pi , j _ i 


2 ipi+i, jPi-i,r 2 iPi f j + ip 


1 > J ' 


[13:127] 


(JV targe) Variance error of any product moment 
from the means 

The covariance between any two product moments, 
each involving the same two variables, is given 
by 


N Pq h P I j 1 = Pg + i , h +j-pghPi j + ^P 20 P 9 -l,hPl-l,j 

+ h j P 0 2 Pg, h-lPl , j - 1 + 6 j PllPg-1, hPi , j - 1 
+ /,i PllP flfh -lPl-l f j " i Pg + l,hPi-l,j 
“ lPg,h+lPi, j -1 _ * P| + l, jPg-1, h 
- h P; f J-+ 1 Pg f h _ l (A large) [13:128] 

Covariance of product moments from the means ( two- 
v art abl es i 

Since formulas [13:127] and [13:128] pertain to 
deviations from the means, none of £, h, i , or j 
may = 1* Formulas applying without this restric- 
tion as given by Karl Pearson (op. cit) are 
[12:129] and [13:130]. As in the two preceding 
formulas £ and i refer to the powers of a first 
variable and h and j to the powers of a second 
variable. When the origins of these variables 
are fixed points we have the variance formula 
[13: 129 ] and the co vari ance formula [13: 130] . 
In each instance JV should be ample. 
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A V(P l } ) = F 2,,2j ~ P 


< i 


. . [13:129] 


yVCrp gh°' P T j rp gh P 1 j’ NC(P ^ P 'j ) F g + i,h + j P gh P ij 

[13:130] 


Special cases of the two preceding formulas 
in all of which terms of order less than 1/N are 
neglected, and certain further formulas are 
given herewith: 

p 22 = V^V 2 (l + 2f 2 ) Assumption of normality [13:131] 


r (°’l cr 2 ) r i2 
Pl3 ” ^ r i2 a l 


Assumption of normality 


0 ^ 


Assumption of rec- 
ti i I nea r! ty only) 


[1.3: 132] 
[13:133] 


r(M 1 a 1 ) = 0 Assumption of symmetry [13:134] 


r (M i 1 ? i) 


r(M i^J) 



r ( r \2 a i) ? /2~ 


Assumption of normality 


[13: 135] 

[13:136] 


r ( r iPi > 


r ■* r, ^^2 (1st var. ) ^ *"l y &?!'>-■> i 1 


■ 12 


12 ^”2 (2nd var.) 


2(l-r^; 


f 2 1 
r 12 — 


[1-. 25(/3 2(lst) + /3 2( 2 n d )~6 _ ] 




[13:137] 

r < r i 2 M l) = 0 Assumption of normality [13:138] 


Let a - M 1 -~b 12 M 2 , the constant term- in a regres- 
sion equation, then 


[13:143] 
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C(ab 12 ) — “ Assumption of normal- [13* 139] 

N V 2 ?ty) 

Involving three vari ables we have, 


■ r C cr i^*2 3 ) 


r l2^' r l3~ :r l2 r 23^ + r l3( ,r i2~ r l3 r 23-^ 


(l-rl 3 ) vr 

Assumption of normality [ 1 3 T 140 ] 


2 N c( r 1 2^13) r i ^ r i ^ r 2^ r i2^ r i^ 


23 * 121 3 


23V 12 13> 


+r i2 r i^ r i2 +r i^ +r 2^] [13:141] 


12" 13V 12 13 ‘"23. 

Assumption of normality 

Involving four variables we have, 


N c(r 12 r 3[i ) - r 1 ,r, lt +r 1 ^r„-r 17 r 1 ,r 


13 24 14 23 * 12* 13 x 14 


r !2 r 23 r 3 4“ r i3 r 23 r 3 4““ r i4 r 2 4 r 3 4 


+ — r i2 r 3 / r i3 +r i‘* + r 2 3 + r 2 40 

Assumption of normality [13: 142] 

Formulas [13:140], [13:141], and [13:142] were 
first given by Filon and Pearson (1898, pp. 229- 
3H) . 

Romanowsky (1925) gives the following formula 
for the variance of a regression coefficient: 

V Cl-r 2 ) 

v ( b ii) ~ y (N _ 3) cf. [10:41] [13:143] 


Karl Pearson (op.cit) gives the following vari- 
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ance for the constant term in a two variable re- 
gression equation: 


V V(b 12 ) 


Assumption of 
n o rm a I i t y 


[13:144] 


The following formula holding for any form of 
bivariate distribution if N is large for the 
variance error of r was first given by Sheppard 
(1898). 

2 _ 2 _ 2 
P 22 “ P 11 Pna~Po 2 Pqm~P 02 

N V(r 12 ) = r 2 12 (- + + 

12 ^2 a _ 2 ^ 2 

Pll 4 p 20 4 p 0 2 


P 2 2*”P 20 Po 2 ^3 1 Pll P2 0 

+ 

2 P 20 P 0 2 P 11 P 20 


Pi 3 *“PnPo 2 
P 11 P 0 2 


[13:145] 


Formulas for a number of the preceding standard 
errors and correlations, not involving the as- 
sumption of normality, are given by Isserlis, 
(1916). In another article (1918) he gives re- 
duction formulas for higher product moments, in 
normal multivariate distributions such, for example, 
as for 

Pn *2^ 1^2X3/^ 


The eight following formulas, holding for 
samples from a normal bivariate population, are 
from Wish art (19 28): 


Ac 


12 


AM 

-I -r\j 

(N~l)c 1 or In other notation — — P±i 
normal bivariate population 


C 12 Pll 

[13:146 ] 
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N 2 V(V 1 )=2(N-l)Vl 
N 2 c(V 1 V 2 ) = 2(N-1) c 2 12 
N 2 V(c 12 )=(N-1)(V 1 V 2 + 


See [6: 5 8] [13:147] 


See [l3:12 6] (nor- f - 

mal bivariate pop- [1 3 I 148 J 
u I at ion ) 


Normal bi- 
C, J variate pop- 
u I at i on ) 


[13:149] 


N 2 c(V lC i 2 ) =2(K~l) v i c 


12 


No rmal bi va ri- . 

ate popu l at i on )[ 13 1 1 50 J 


N 2c (V 1 C 2 j) -2(N~l) C 12 C i 3 ate population) [13:151] 
N 2 c(c 12 Ci 3 )=(N-1)(V 1 c 23 + c 12 c 13 ) [13:152] 

Normal trivariate population) 

N 2 c(c 12 c^)=( N -l)(c 13 c 2li +c ^c 23 ) [13:153] 

Normal qu ad r i va r i at e population) 

Sundry uni-, bi-, and multivariate formulas 
for various moments and product moments, holding 
for small samples of any form are given by both 
Pepper (1929), and Feldman (1935). 

section 15 . SEQUENTIAL ANALYSIS 

This is a method wherein observations of the 
items in a lot are taken in succession until the 
cumulated evidence warrants the acceptance or 
rejection of the lot. This inspectional use of 
sequential analysis differs only in minor aspects 
from its use in experimentation and in quality 
control . 

We here consider its use in connection with in- 
spection of a product which can be classified as 
defective, or non -defective , and in whi ch the num- 
ber of cases in the lot is large in comparison with 
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the undetermined, but much smaller number, in- 
spected. Adaptation to the inspection of a product 
having a graduated score as, for example, tensile 
strength of a piece of metal or scholastic standing 
of a college student is, in general, simple. 

The abscissa of Chart XIII XII is the number 
of items inspected and the ordinate is the num- 
ber of defective items. The broken line indi- 
cates the results as found after one item, two 
items, three items, etc. have been inspected. 
This sample number line is also a time line show- 
ing the amount of information given at each suc- 
cessive inspectional stage. All the information 
due to earlier inspections is inherent in the 
status shown by the last plotted point on the 
line. For any point, the ordinate divided by the 
abscissa gives the proportion of defective items, 
an unbiased estimate of the true proportion in 
the lot. Of course this estimate is the more 
reliable the more items are inspected. 

The proportion of defectives found at any 
stage of inspection is an unbiased estimate of 
the proportion maintaining in the lot, but when 
the number of items inspected is small the esti- 
mate is relatively quite unreliable, particular- 
ly when the proportion is very high or very low. 

Prior to any knowledge of the actual number of 
defectives found after n-inspections, the ordinate 
at n can be divided into three parts: an upper 
portion wherein the number of defectives is so 
great that the lot is rejected, a middle portion 
wherein decision is reserved, and a lower portion 
wherein the number is so small that the lot is 
accepted. This statement holds after the first 
few inspections, for in practical situations a 
final decision either way would riot be made upon 
the basis of one or two or other small number of 
inspections even if all the items inspected were 
perfect or all imperfect. Since the division of 
every ordinate into these three parts is possible, 


[ 13 : 154 ] SEQUENTIAL ANALYSIS 


557 


we obtain, by connecting homologous points, the 
three regions shown on Chart XIII XII. It is 
necessary to lay down specific conditions so that 
the lines demarking these regions maybespeci.fi- 
cally defined. Two lines, those for d 1 and d 2 , 
which, conveniently, turn out to be straight lines 
have particular merit. These lines are functions 
of p 19 p 2 , a, and ft defined as follows: 

p ± is a proportion of defectives so small that 
a lot having this proportion is considered ac- 
ceptable. 

a is the risk of rej ecting a lot in which p = p 1 . 
This risk is generally small for it is considered 
highly undesirable to reject so excellent a lot. 

p 2 is a proportion of defectives, greater than 
p 1$ of such size that a lot having this propor- 
tion is considered unacceptable. 

f3 is the risk of accepting a lot in which p "P 2 . 

a + J3 < 1. 

The operating characteristic or OC curve, as 
shown in Chart XIII XIII, indi cates the relation- 
ship between p x , p 2 , a, and J3, The ordinate L p is 
the relative frequency of acceptance of a lot 
having p proportion of defectives under the par- 
ticular sampling plan chosen. The ideal OC curve 
would be ~~L_ shaped. There would be some value p' 

( p ' = s as given by [13:156]) such that all lots 
having a smaller p would be accepted indubitably. 
Such a situation can be brought about if the in- 
spection plan calls for sampling the entire lot. 

It may be observed at this point that standard 
statistical procedures antedating the development 
of sequential analysis would endeavor to estimate 
from a sample the value of p in a lot, compare it 
with a standard proportion s, co -determinate, as 
shown later, with p 2 andp 2 , and draw a conclusion 
as to acceptance or rejection. In standard sta- 
tistics each of the two hypotheses can be tested 
independently. In sequential analysis a decision 
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REGIONS OF ACCEPTANCE AND REJECTION 



between the two is forced and must be accepted. 
Properly used, there is no systematic difference 
in outcomes, but it has been pointed out that the 
sequential analysis method will generally result 
in savings in the number of cases sampled in the 
neighborhood of 50 per cent. The question of the 
task placed upon the executive who decides upon 
the sampling scheme is also pertinent, lilt is 
a sequential analysis scheme, he must choose, 
from a priori considerations, the crucial values 
p., p 2 , a, and/3, while if it is a standard scheme 
he must choose s and acceptable probability limi s 
both for p below and for P above s - 

The bounding lines d 1 and d 2 of Chart All 
XII are functions of h 1 and h 2 , which are func- 
tions of P x , P 2 > a » and and o£ S whlch 1S a 
function of P 2 and P 2 only. 
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A, = 


l0g (1 ? 


log^(il^l) log £^A) 


Pi l-p 2 


1-p, 


[13:154] 


h n = • 


log 

a 


a s 


, P 2 / ^ Pi \ 1 / 1 ~ P 1 

log— (- ) log(- A) 

Pi l-p 2 l-p 2 


"13: 155] 


s = 


l°g (i^-) 

1 P 7 


log 


Pi I*? 2 


[13:156] 


d, = 


h i + sn 


d n = 


h 2 + sn 


Be I o w wh i ch Is 
acceptance region 

Abo ve which is 
rejection region 


[13:157] 

[13:158] 


The proof of these formulas is given, in the 
revised report o f the Statis ti cal Research Group of 
the Applied Mathematics Panel (1945). Other im- 
portant contributions to the early development 
of sequential analysis are Neyman and Pearson 
(1936), Hotelling ( 1941) , Bernbaum (1942), Bartky 
(1943), Wald (1943), (1944 general, (1944) cumu- 
lative), (1945); Freeman (1944), Wald and Wolfo- 
witz (1944), (1945). 

The bounding lines of the chart can be plotted 
prior to the inspection of the first item. If 
upon inspection the first item is perfect, the 
point with coordinates (0,1) is plotted, while if 
defective, the point (1,0) is plotted. In either 
case, the plotted point lies in the region indi- 
cating further inspection as necessary, so a second 
observation is taken and the outcome plotted and 
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the process is continued until a plotted point is 
obtained in the region of rejection or m that of 
acceptance, thus terminating the inspection. 

Alternative to this graphic procedure, one may 
draw up a table giving, for each successive value 
of n, a value of d above which lies rejection and 
a value of d below which lies acceptance. 

The operating characteristic curve (the OC 
curve) of Chart XIII XIII is derivable from the 


CHART XIII XIII 
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This is the ratio of the probability of the ob- 
served data arising if the second hypothesis, 
viz. p = p 2 , is true to the probability of its 
arising under the alternate hypothesis, viz. P = Px» 

Two constant values of L , which we designate 
A and 8, exist, which divide the ^-dimensional 
space m u (m = 1,2 , ...,n) into three mutually ex- 
clusive and exhaustive parts, R* , R^ , and R m . 
Corresponding to the division of the m-dimension- 
al space is a division as shown of the space in 
Chart XIII XII. If the sub-samples consist of 
one observation each, the m dimensions are the m 
observations, or scores, 0 or 1. R * is the re- 
gion in the m-space wherein the p - p x hypothesis 
is tenable with a risk not greater than a. 
is the region wherein the alternative hypothesis 
p-p 2 is tenable with a risk not greater than /?. 
R m is the remaining region and herein neither of 
these hypotheses is tenable with the small risk 
assigned to it. 

The sequential analysis is terminated at the 
smallest value n of m for which the sample point 
lies either in R* or R*. If in R ^ , we accept 
the first hypothesis and if in R* f we accept the 
second hypothesis. 

We may take the following as the values of A 
and fi: 

A = - — [13:160] 

a 

5 = [13:161] 

1 - a 

These are close approximations if a and /3 are 
small and the approximation is such as not to 
decrease the strength of the test, but perhaps 
increase very slightly the necessary number of 
observations. The logarithms of A and 1/8 are 
needed so we write 
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a = log 1 = 

i 1-/3 

a Note [l3*. 170] 

[13:162] 

and i 

b = log 7 = 

_ l~ a Note [l3:17l] 

!og 0 

[13:163] 

It will be use 

ful to note that 

3 - h 2 

• [13:164] 


a + b h 1 + h 0 


If, in Chart XIII XII, an ordinate is drawn 
at any point n and bounded by the lines 0 
and d - n, it is divided into three parts. The 
ratio of the part lying in the reserve decision 
region to the sum of the parts in the other re- 
gions may be computed. This ratio obviously de- 
creases as n increases, which is to say that, 
for practical purposes, if the sequential analy- 
sis is carried far enough, a point outside the 
reserve decision region will be attained, thus 

terminating the problem. 

We can note that if, after a very large number 
of observations, a point in the reserve decision 
region is attained, the ratio of the number of 
defectives to the number of observations approaches 
the slope of the lines bounding this region, 
thus this slope, s as defined in [13:156 J , is 
also P', the standard value of P between p x and 
p , for which it is indifferent whether the lot 
is accepted or rejected. Note that s is a func- 
tion of p 1 and p 2 only and not of the risks a 

^Reference to Chart XIII XII shows that the 
value of n given by the intersection of a % + sn 
and d = 0 gives the minimal number of observa- 
tions which could lead to the acceptance of the 
lot. The value of n given by the intersection 
0 £ d - h + s n and d - n gives the minimal num- 
ber of observations which could lead to the re- 
jection of the lot. Ordinarily some number ol 
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observations greater than either of these is 
necessary to reach a conclusion. The average 
sample size, n p , necessary to reach a decision, 
forms a distribution somewhat as indicated in 
Chart XIII XIV. We let the average sample number 
(ASN) , required when p = p 1 , be designated n p , 

and when p = P 2 , we designate it n . These may 

be the average sample numbers with which one 
is most concerned but, since the lot p is likely 
to lie between p 1 and p 2> a generally more in- 
formative ASN is n s . Certain ASN values are 

CHART XIII XIV 


A5N CURVE 
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approximately as follows: 

( l-a)^ - a h 2 

n - 

p i s - p x 


[13:165] 


[13:165] 


. (1 -/3)h 2 ,-/3h 1 

n = : . . . . [13:166] 

P 2 P 2 -S ' 

A good idea of the entire ASN_curve can be 
gotten by plotting five points, n p ^ , rr , and 
the following: 

n 0 - ill. ....... [13:167] 

s 

n ± = [13:168] 

1 1 - s 

— A. ti 0 ' 

n = -l— ..... [13:169] 


n s is not the maximum point of the curve, but as 
indicated in Chart XIII XIV it lies near this 
maximum point. 

The cost and time arrangements in connection 
with an inspection scheme should be sufficiently 
flexible to meet a situation in which the actual 
number of inspections required to reach a con- 
clusion is 2.5 or 3 times as great as that indi- 
cated by the greater of n p and n p . 

The sequential analysis just discussed is 
that known as the "binomial case" "because the 
scores, defective or non-defective, yields a bi- 
nomial distribution. We here give a numerical 
illustration of this case and then discuss the 
normal (wherein the distribution of scores is 
normal), the Poisson, and other cases. 

Illustrative numerical problem. A lot of 
10,000 bolts is to be inspected to determine 
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whether it shall be accepted or rejected. The 
standards set by the accepting agency are: 

Pi “ -05, i . e. , if not over 5 per cent are 
defective with respect to the charac- 
teristic in question, the lot is ac- 
ceptable. 

a - .02, i.e., the risk of rejecting so 
good a lot is not greater than .02. 

P 2 = .15, i.e., if over 15 per cent are 
defective, the lot is not acceptable. 

/3 = .04, i.e., the risk of accepting so 

poor a lot is not greater than .04. 

We compute by [13:154], [13:155], [13:156], 
[13:169], [13:157], [13:158]. 

h ± = 2.64 

h 2 = 3.20 

s = .09194 

n $ = 101 2 times 101 is judged not to 

be a prohibitive number of 
inspections. 

d 1 - 2. 6 + . 09 2 n 
d 2 = 3.2 + .092 n 

As a result of inspection, the performance curve 
shown in Chart XIII XII was obtained, leading 
to rejection after but 17 bolts were inspected. 

Since 17 is considerably smaller than 51 (~ n ), 

p 2 

it is probable that the actual percentage of de- 
fectives is in excess of 15. 
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The tabular treatment of Table XIII J is 
equivalent to this graphic procedure. Columns 
n, d ± , and d 2 are drawn up prior to the first 
inspection. The recorded d 2 values are the in- 
tegral values next higher than fractional values 
ordinarily given by [13:158] and the recorded d ± 
values are the integral values next smaller than 
fractional values gi ven by [13 :1 57 ] . The entries 
in column d are recorded as t 1, 2, 3, etc., in- 
spections are made. Since, after 17 inspections, 
a value in the d column is attained as great as 
the value in the d 2 column, the lot is rejected. 
If the lot had been a much better lot, a value 
would have been attained, after a certain number 
of inspections, as small as the value in the d x 
column and the lot would have been accepted. 

The normal case (involving a one-sided alter- 
native test of the mean): If the score, X, at- 
taching to an observation is a normally distri- 
buted score with known standard deviation, a, (and 
variance V) a precise sequential analysis test 
of the difference between the two means is avail- 
able. We have 

the smaller mean 

M 2 the larger mean 

a the risk of believing a lot to have a 
mean of J/ 2 when it in fact has a mean 
of K x 

jB the risk of believing a lot to have a 
mean of M 1 when it in fact has a mean 
of M 2 

We here desire, for purposes of acceptance, 
lots whose means exceed M 2 and this is a one- 
sided alternative. Had we desired lots whose 
means neither exceeded nor fell short of M 2 by a 
certain amount we would have had a two-sided al- 
ternative, a readily soluble problem ( SRG Report 
255, Sec. 5, 1945). 
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Let a and b be the following natural loga- 
ri thms: 


l-)8 


a - In : — Note [13:162] 

a 


[13: 170] 


1 - 


b - In — - — Note [ 13 : 163 ] [13:171] 

r~ 

We have for the indifferent mean and also for 
the slope of the acceptance and rejection bound- 
ary lines 

m x + m 2 

s = [13:172] 

The intercepts of these boundary lines are h 2 
and 

6 V 

h 3 ........ [13:173] 

1 m 2 -m 1 


a V 

’ 2 ” M j 


.[13:174] 


The acceptance and rejection boundary lines are 
given by [13:157] and [13:158]. The operating 
characteristic curve (a left-right reflection 
of a Chart XEII XIII curve, with abscissa labeled 
M instead of p) is 

e* 1 - 1 

L m =1 [13:175] 

e *2 - 1 


in which 

■ t 1 = 2 (s~M)h 1 [13:176] 


t 2 = 2 ( s-M ) ( h 1 +h 2 ) . . . . [13:177] 


L m is the relative frequency of accepting a lot 
whose mean quality is M. L M is of- indeterminate 
form when M = s, but it can be evaluated. We 
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L. =■ 


h l + b 2 
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.[13:178] 


Average sample numbers are readily obtained. 
L m (h^h 2 ) - /i 1 




[13: 179] 


n 


s 


b l b 2 

V 


[13: 180] 


2 [( 1 - a) b - a a] V 
(M 2 - M x ) z 


[13: 181] 



2 [( 1 ~ /?) a - /3 fo ] y 

- V 2 


[13: 182] 


fte need criteria for the acceptance of M - M x 
and of M = M 2 . Since we have a normal distri- 
bution the ratio of the likelihood that the sam- 
ple in question would arise under the hypothesis 
M = M 2 to its likelihood under the alternative 
hypothesis M = ¥ 1 is simply, 


exp 

L = . . . , [13:183] 

exp 


Taking natural logarithms, which does not dis- 
turb the relationships [13:162] and [13:163], we 
can readily obtain the following criteria: 


IX-ns 


> V a 


m 2 -m. 


Accept M - M j 


[13:184] 
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IX -ns 


< V a 



Accept M-M 1 [13:185] 


-V b 

m 2 -m 


■< (IX~ns ) < 

1 


V a 

m 2 -m ± 


Rese rve dec i s i on, 
i . e. , conti nue 
testing 


[13:186] 


The actual solution may be carried out graph- 
ically by means of such a chart as XIII XII or by 
tabular methods, recording, for successive n* s, 
the values of IX which lead to the acceptance ot 
M - M 2 and again of M - M ± . 

Further applications of sequential analysis: 
The method applies to many situations of which 
the following will be noted: 

Double dichotomies : (SRG 255, Sec. 3) re- 
vised, 1945) Herein the problem is that of 
choosing between two processes, each of which 
leads to one of two results. An observation 
(random) of a first process case is compared 
with an observation (random) of a second process 
case, with one of three outcomes, (a) first pro- 
cess superior, (b) first and second processes 
equal, or (c) first process inferior. This is 
continued until the sequential analysis test 
shows one of the processes to be superior, with- 
in the risk limits set. 

Test of variability: (SRG 255, Sec. 6 re- 
vised, 1945) The question here is to decide 
whether the standard deviation of a submitted 
lot differs (within some small assigned risk) 
from some standard, or acceptable, standard de- 
viation. The solution is dependent upon a normal 
distribution. 

Test in the case of a Poisson distribution: 
(SRG 255, Sec. 7, 1945) The chief hazard here 
lies in the assumption of a Poisson distribution. 
The experimental determination that a Poisson 
distribution is present is in general far more 
difficult than the sequential analysis procedure 
that follows therefrom. 



CHAPTER XIV ! 

MATHEMATICS, THE MENTOR OF jj 

STATISTICAL INGENUITY | 

section l. INGENUITY IN RESEARCH 

it 

Cleverness ip laboratory technique, in utili- 
zing unique properties of materials, and in mea- j| 

suring differences rather than gross magnitudes j| 

has resulted in many of the great discoveries of | 

modern physical, biological, and even social I 

science. Another field for ingenuity has been jj 

relatively neglected. It has been a tacit as- 
sumption that the elementary associative and 
distributive laws of mathematics hold. Gener- jj! 

ally this has proved serviceable and where it 
has not the disproof of the assumption has fre- 
quently been the most direct way of discovering ;J 

some important truth. There are innumerable Ij 

relationships between and within mathematical 4 

functions and unique to the particular functions J 

employed, which have not been exploited in con- 
nection with phenomena. It seems that it has 
not been customary to look to mathematics for I 

the key to material and biological behavior. |j 

The writer believes that mathematical relation- jijl 
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ships are never the exact key and that living 
behavior is always more complex and always con- 
ditioned by more factors than are included in a 
mathematical formulation, but nevertheless such 
formulation may be an approximate key and one of 
the richest sources for suggestions as to life 
behavior. 

A striking illustration of this principle is 
to be found in the properties of Latin and Graeco- 
Latin squares and in the design of experiments 
as developed by P. A. Fisher (1937) and his 
school. With plot designs consisting of rows 
and columns and equal numbers of observations in 
each cell, there is zero correlation between the 
frequencies in the rows and the frequencies in 
the columns. This is a property discovered 
through the study of mathematics. Agricultural 
experimentalists have been shrewd enough to in- 
corporate this principle in their experimental 
designs and they have obtained an economy of 
procedure and a certainty of conviction not pre- 
viously possible. It is in this sense that 
mathematics is to be thought the mentor of sta- 
tistical ingenuity. The use of the properties 
of the normal distribution, the principles of 
sequential analysis, the merits of independent 
variables as availed of in the analysis of vari- 
ance and in multivariate analysis into principal 
components, are a few additional illustrations 
of illuminating transfer from mathematics to 
living phenomena. 

There are innumerable other mathematical prop- 
erties which have never been availed of in ex- 
perimental design and procedure. In the follow- 
ing sections are given a number of mathematical 
functions having unique properties which may be 
suggestive of phenomenological relationships. 
This chapter is intended as a background — though 
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all too brief — for the experimentalist so that, 
if his data do show some of the symptoms of say 
a particular mathematical function, he can then 
seek out all the invariant properties of the 
function and return to his data with the hypoth- 
esis that it has parallel invariant properties. 
Mathematics should be more than a cultivator 
that is dragged behind a lead horse, — it should 
be the horse itself, and frequently a soirited 
one, that chooses the ground that is to be cul- 
tivated. 

section 2 . MATRICES AND DETERMINANTS 

A few facts about matrices and determinants 
will assist in understanding many theoretical as 
well as applied problems in statistics. A matrix 
is a rectangular array of terms, cal led elements. 
It does not have a quantitative value. A matrix 
may be designated by a single letter (usually 
script capital Cl, 8, etc.), by a complete re- 
cording of all its elements, by a partial re- 
cording with dots to indicate obvious non re- 
corded elements, by a designation of the general 
element a. . , or thus— ||a..,||. Also in the 
case of a square matrix, by giving the terms in 
the principal diagonal, thus ||a lrf a 2 2 , . . . , a k J| 


fl ■ lla-Q* a 22> * * • 9 \k I 


The double rules are used to distinguish between 


a ll 

3 1 2 

a ik 

a 2X 

a ? 2 * * * ' 

a 2k 

. 

• • * » a 

• 

• 

• o « a « 

* 

* 

* • 9 • • 

6 

• 

• « • • • 

• 

3 ki 

B k 2 

a kk 


[14:013 
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a matrix and a determinant. 

A matrix having k rows and & columns is of 
order kZ. 

Two matrices are equal only if each element 
of the one is equal to an element in correspond- 
ing position in the other. 

Matrix 0/ derived from matrix d by inter- 
changing rows and columns is called its trans- 
pose. E.g. 


a ll 

a i2 

a i3 

a 21 

8 2 2 

a 23 

a il 

a 2 1 


a i2 

a 2 2 


a i3 

a 2 3 
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The adjoint of a square matrix d is the ma- 
trix whose elements are the cofactors (defined 
later) of the elements of CL'. 

A square matrix is symmetric if and only if 
it is equal to its transpose. Thus an element 
a. . must equal its conjugate a. f * 

The addition of matrices: If three matrices 
d, 8, and C, having elements a ] . , and c.. 
are such that C is the sum of d and fe, 

c = a + 8 

then Cj. = . + h^.f these elements being added 
in the ordinary algebraic sense. 

Matrix multiplication by a scalar: I f the 
matrix d is multiplied by a scalar (a real num- 
ber, as distinguished from a vector) h , then 
every element in d is multiplied by b , thus, e.g. 


d= 

a ll a l2 a i3 

and b d= Qb = 

ba n 

ha l2 

ha l 3 


a ?l a 22 a 23 


ha 21 

ha 2 2 

^ a 2 3 j 


[14:03] 
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The multiplication of matrices: With matrix 
theory meanings attached to the terms pre- and 
post-multiplication, let the matrix C be equal to 
matrix Q post-multiplied by the matrix 13. The 
element in ith row and k th column of C is equal 
to the sum of the products of the successive ele- 
ments in the ith row of ( 1 with the successive 
elements in the jth column. E. g. , let 


and 


Q = 

a n 

a l2 

a i3 a m 


a 21 

a 2 2 

S 23 a 24 

II 

QQ 

^11 

b 12 



6 21 

b 22 



^31 

b 3* 



then 

C =Q B= 


[14:04] 


a il^li ra i2^21 + a i3^3 1 <3 11^12 +a i2^2 2 +a i 3^32 

a 21^11 +a 22^21 +a 23^31 a 21^12 +a 22^22 +a 23^3 2 


G is pre- multiplied by or 13 is post-multiplied 
by CL The multiplication operation is not com- 
mutative, for in general as t sa. 

The transpose of the product of any number of 
matrices is equal to the product of their trans- 
poses taken in reverse order. E. g. , (ABC)' - 
C'B'A'. 

The identity matrix A is the following square 
matrix: 


1 0 
0 1 


0 0 


[14:05] 
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It is such that GA AG G. 

For a square matrix G, with elements a. . , let 
A = the determinant having the same elements, 
a.., as matrix G, and let A be the co factors 
o'f J the elements a,.. Then the inverse of G is 
G" 1 and is given by 

^11 ^21 
^1 2 ^2 2 

a - 1 =4- 1 ’ ■ • \+Z(adjoi n t ofQ) 

[1 4: 06 J 


k 1 


k 2 


A A 

A lk n 2k 


k k 


Note that the element that is at the intersection 
of the ith row and j th column in the adjoint is 
not the cofactor of a, . , but of a.,. A matrix 
and its inverse are such that 

GG" 1 = G" 1 G. = A [14:07] 

The multiplication of matrices is associative. 

E.g., G 13 C = GU3 C) = (G B)C, the order of the 
letters being the same in all instances. 

The inverse of any product of matrices is the 
product of their inverses in reverse order. 

The multiplication of matrices is distriouti e. 

GU3+C) = G B + G C .[14:08] 

A diagonal matrix has non-zero elements in the 
diagonal' only. if these elements are the same 
throughout the matrix is a . . , * . 

having the property that pre-multiplication by 
and post-multiplication by it are the same and 
further, if the common element m the diagonal 
terms of the scalar matrix 


[14:11] MATRICES AND DETERMINANTS 577 


4 0 . . . 0 

0 4 . . . 0 


[14:09] 


0 0 ... 4 


is 4 then GQ. 3 QG 3 &4 3 ^G 3 a matrix having 
elements (4 ^. . . 

In an equation of matrices a multiplying fac- 
tor may be moved to the other member of the equa- 
tion by substituting its inverse, while main- 
taining its same relative position. E. g. 

“v r ^ 


G 8 C 3 

s c = 

C 3 


19 

a- 1 *) 


G 8 C 3 JS 

alsoS G. 3 19( 8 C)- 1 ’ 




G 3 19 C- 1 S“ J 


also 


a 8 C 3 19 
‘ 8 C 3 G _1 D 

8 3 G _1 19 C' 1 


[14:101 


Determinants: A matrix, including a square 
matrix, is just an arrangement of elements, but 
if the elements in the square matrix fl, equal to 
li a n a 22 ••• a k i< I I » are evaluated as tne deter- 
minant A, (also frequently designated A) 


A = 




l a H a 2 2 


k k 


a il a !2 # * * H lk 

a 21 a 2 2 a 2k 


[14:11] 
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we have an ordinary algebraic polynomial, 

A determinant of k rows (and k columns) is one 
of the kth order. 

The k ! terms in the polynomial A cover all 
possible combinations yielding terms having one 
element from each row and each column. The sign 
factor that attaches to each term can be deter- 
mined by ascertaining whether the number of in- 
versions in the first subscripts plus the number 
in the second subscripts is even or odd. If 
even the sign is positive, and if odd it is 
negative. E. g. , consider the following term in 
a fifth order determinant: a i 5 a 2 3 a 3 2 a 4 i a 5 4 • 
The first subscripts , 1 , 2, 3, 4, 5, show no 
inversions and the second subscripts 5, 3, 2, 1 , 
4, show seven (the 5 being to the left of four 
smaller numbers, the 3 to the left of two, and 
the 2 to the left of one). Accordingly, *~ a 15 
a 23 a 32 a l ^ 1 a 34 is one of the 5 ! terms in the ex- 
panded fifth order determinant. 

a X1 is the leading element of the determi- 
nant A< 

a n a 22 • * . a kk i s the leading term . 

The a lx to a kk is the principal diagonal , 
and the a kl to a lk is the secondary diagonal . 

Interchanging two rows, or two columns, changes 
the sign of the determinant. 

If the i th row and the j th column of A are 
crossed out, the remaining determinant is called 
a first minor and frequently designated Aj j . A 
first minor of the type A. . is a principal first 
minor . 

A positive or negative sign attaches to the 
i, j position. It is given by (~1)‘ J * When 
this sign is attached to the minor we have a co- 
factor, which may be designated A. j . 

- C-l ) 1 + i A, j 


[14:12] 
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A determinant may be expanded in terms of the 
elements of any row, or column, and their co- 
factors, E. g. , in terms of the elements of the 
third row: 

A — a*. jA 3 i"^* ^ 32^32 * * * "f* a ~ ,A 9 or 

3 3 3 3 >k 3 [14:133 

A =a 31 A 31 - ^32^32 + ••• + (”'D 3 + I<a 3k^3k 

The terms symmetric, positive definite, Gra- 
mian apply to determinants as well as to square 
matrices. 

A positive definite determinant is one in 
which all the principal minors > 0. 

A gramian determinant is a symmetric positive 
definite determinant. This is the characteristic 
type of major determinant in multiple correlation 
and regression. 

The rank of a non- vanishing determinant is 
equal to the order of the determinant. 

The rank of a vanishing determinant is equal 
to the rank of its largest non-zero minor. The 
rank of a matrix is equal to the rank of the de- 
terminant of largest rank that can be made from 
its rows and columns. 

If the elements of any row, or column, of a 
determinant are multiplied by a constant, the 
determinant is multiplied hy this constant . 

If the elements of any row (or column) multi- 
plied by a constant yield the elements of any 
other row (or column), then the determinant is 
equal to zero. 

If the addition of aj. to an element, , of 
the non-vanishing determinant A % yields a deter- 
minant which equals zero, then a ? j is a factor of 
A, and a f . does not appear in the expanded and 

and reduced form of A. If a la , a 2 b , • • • f a k g 
are k such factors, where k is the order of A, 
A = a la a 2 b . . . a k g (a sign factor) . If the num- 
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1 14: 14] 


ber of inversions in the order a,b, . . . , g i s 
even, the sign is positive, and if odd it is 
negative. In particular should the factors be 


L ll ,a 22 5 


a 


k k ; 


then 


a 


k k 


[14:14] 


DETERMINANTAL SOLUTION OF SIMULTANEOUS EQUATIONS 

Given k simultaneous linear equations, so 
written that the constant terms are to the right 
of the ~ sign, thus: 


a ll x l 

4* 

3 1 2 X 2 

+ . • . 

+ 

a kk X k 

= a oi 

li 

P 

o 

S 2 1 X 1 

+ 

^2 2^2 

+ • • . 

+ 

a ak X k 

” a 0 2 

a 2 0 

• 


• 



• 

* 

• 

• 


• 



• 

• 

• 

• 





• 

• 

• 

3 kl X l 

+ 

a k2 X 2 

+ ... 

+ 

a kk X k 

” a 0k 

a k0 


Write down the major determinant 



a ll 

a 1 2 

* 3 lk 

a io 

11 

11 

o 

a 21 

a 22 * • 

’ a 2k 

a 20 


•: 

• 

• 

• 


• 

• 

« 

- 


- 

• 

• 

* 

■ ■ 

a k 1 

a k2 * ’ 

• a kk 

a k0 


a o 1 

a 0 2 

• a o k 

a o 0 


[14:15] 


[14:16] 


in which the value of a QQ is immaterial in con- 
nection with the solution for the x’ s, as it dis- 
aopears in all the equations [14:17]. The values 
of the x' s are given by 


[14:22] 


POINT BINOMIAL 


SECTION 3. the POINT Bl NOM I AL 


[14:17] 


The moments of (p+q) n about its mean are 
readily found by use of a reduction formula due 
to Romanovsky* (1923). We have 


Moments of a binomial series 


Mo 

Mi = o 


.[14:19] 


,[14:20] 


p 2 - npa . . . 

= npq(q-p), 


See [9:02] 


• [14:21] 


[ 9 : 03 ] [14:22] 


* Romanovsky shows that 

dp 

AW-Wf - +ns M s _i^* . . . 

dp 

The repeated use of this gives f±^ y fx etc. 


[14:18] 
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= n PQ [l+3(n-2)pq]. . See [9:04] . . • [14:23] 
/i 5 = npa ± (q-p) [ l+pqf-12+10n,) ] [14:24] 

p 6 = npq{l+5pg [~6+5n-(-pgf 24“26n+3n 2 ])] } [14:25] 

section 4. THE POISSON DISTRIBUTION 


The Poisson series is obtained from the bino- 
mial by letting p, the probability of the event 
occurring, be indefinitely small, but n so large 
that pn is finite. Poisson’s exponential limit 
of the binomial (p+q) n (see Yule and Kendall, 
1937, pp. 187-191) is a distribution such that 
the probability of a variate (x = 0, 1, 2 ...) 
taking the value of x is M*e~*/xl, or 


M , * M 3 

e' M (l+¥+ — + + 

2! 3! 


Poi sson 

distribution LJL4 : 26 J 


e~ M is the relative frequency of X = 0; e" M ^ the 
relative frequency of X = 1; e~ M ! 2 / 2! the rela- 
tive frequency of X = 2; etc. The first four 
moments of this distribution about the point X-0 
are 


Mi = I f = pn . . : . . - [14:27] 

M2 = * [14:28] 

M3 = Jf^ 2 +3#+U . [14:29] 

= M(M 3 +6M 2 +7M+l) [14:30] 

The moments from M are 

Moments of a Poisson series 
M x “ 0 Dl 4: 31] 

V = fi 2 = M [14:32] 

p 3 = K [14:33] 
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^ = M(1+3K) . . . [14:34] 

^ = KC1+10M) [14:35] 

ptj = M(1+25M+15H 2 ) [14:36] 


The moments are determinable from a knowledge of 
the mean and without knowledge of p and n sepa- 
rately. Thus for the Poisson, M is a sufficient 
statistic , 

Molina (1942) has tabled individual terms and 
cumulated sums for M - 0 to .01, by intervals of 
.001; for M = .01 to 15 by intervals of ,01; for 
M ~ 15 to 100 by intervals of 1, It seems proba- 
ble that al 1 ordinary needs can be served by this 
table. 

Pearson (1914) has tabled the successive terms 
of the Poisson distribution, to six decimal places, 
for M from .1 to 15, by tenths. All the cumul ants 
of a Poisson distribution are equal (and equal to 
M ) , a fact not true for any other distribution. 
If =x x + x 9 , in which x 1 is distributed in the 
Poisson manner wi th parameter *f, and x 2 , indepen- 
dent of x ±i is distributed in the Poisson manner 
with parameter M thenx 3 is distributed in the 
Poisson manner with parameter 0/+*/'). 

section 5. THE HYPE RGE0MET R S C SERIES 

This series can be explained by reference to 
an example. Let there be an urn containing n 
balls, of which qn are black and pn are white. 
Let V balls be drawn at a time. Let X equal the 
number of white balls in such a sample of /V. 
This number will vary from sample to sample from 
0 to /V, or to pn, whichever is the smaller. 

When -/V are drawn the probability that the 
first of the /V is not-white is qn/ n ; that this 
having come to pass the second is not-whi te it 
is (qn~l) /(n~l ); that, the preceding having come 
to pass, the third is no t-whi te it is {qn- 2) 
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[14:37] 


(•V-2); etc. to (gn-ZV+l)/ (n-A r +l ) , the probability 
that the preceding having been not-white the /Vth 
is also not-white. The product of all these is 
the probability that when N are drawn none will 
be white. This gives the proportionate prob- 
ability given in the first row of the accompanying 
table. The other rows are obtained by a very 
similar procedure. 

X = no. of Proportionate frequency of 
white balls occurrence 


0 

qn( qn-\)( qn-2) . 

. .(qn-N+1) 

n(n- l)(n-2) • 

.. Cn-N+1) 

1 

pn(qn)(qn-l) .. 

. (qn- A’+2J> 

n(rt-l)(n-2) .. 

. (n-N+1) 1 

o 

pn(pn~l)(qn) .. 

• (qn-N+ 3) V ^ N 

2 

rt(n-l)(n-2) .. 

. (n-N+1) 2 

* 

• 

[14:37] 

V 

( pn )'. (qn)\ N\ 

(n-N ) > 

A 

n! (pn-X)\ ( qn-N+X )[ (N-X)\ X\ 

• 

pn(pn- IX pn~ 2) 

. ..(pn-N+l) 

N 

n(n- \)(n-2) 

. ..(n-N+l) 


This series of frequencies is a hypergeometric 
series. When n is infinitely large with reference 
to N it becomes a binomial series. 

Pearson (1924) has provided formulas for the 
first four moments as herewith: 


[14:43] 
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Moments of a hypergeometric series 
if = pH 


[14:38] 


1 [14:39] 

0 [14:40] 


fn-1) 


[14:41] 


Npq( g~p)( n-K)( n-ZN ) 
(n-\)(n- 2) 


[14:42] 


Npc£n-N) {n(n+l)~&i(n-*l )+ipq [n 2 (N~2)-nH 2 +&V(n-rt)] } 


(n~l)(n~2)(n-3) 


[14:43] 


An important statistic in connection with a 
probability function such as the binomial, the 
Poisson, or the hype rgeome trie distribution, is 
the sum of the frequencies up to a designated 
point. This matter is adequately covered by 
tables in connection with the Poisson distribu- 
tion. Camp (1924 and 1925) gives methods for 
obtaining this desired sum for both the binomial 
and hypergeometric distributions. They would 
seem to be adequate for most practical needs, 
though not of the highest precision nor of the 
simplicity of computation which one would desire. 

The obtaining of this sum for the still more 
useful normal, t (Student’s t ) , , Pearson Type 
III, and variance ratio distributions is, in all 
cases, possible, as explained elsewhere in this 
text. 

SECTION 6 . FACTORIALS AND THE GAMMA FUNCTION 

Hie gamma function of a number x+1 is defined 
by 
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r(x+l) ~ Jo°° y y X dy The gamma funct Ion[l4: 44] 
It is such that 

reduction 

r(x+l) = X r X • • • • • r formula [l4t 45] 

This holds for all values of x, integral, frac- 
tional, positive or negative. The term factorial 
is commonly applied to positive integers, in 
which case 

X* “ X ( X~\) ! Factorial reduction formula [ 1 4 * 46 ] 

When x is a positive integer 

FCx+lJ) = x! [14:47] 

It is frequently serviceable to generalize the 
concept factorial so that [14:46] and [14:47] 
hold when x is not a positive integer, When this 
is done we note that by repeated applications of 
[14:45] the factorial of any number can be ex- 
pressed as the product of certain numbers times 
the factorial of a number x' , wherein 0- x ' - 1. 
For example 

2.5! = 2.5 X 1.5! = 2.5 X 1.5 X .5! 

Another example: 

( - 2 5) , - (-1-5)1 = JL 

' ' -1.5 (— 1. 5)X(“. 5 ) (-1.5)X(-.5)X.5 


Thus, except for the labor involved, a table of 
x| , for values from ,00 to 1.00, or of F x from 
1.00 to 2.00, is sufficient to yield, with oper- 
ations of multiplication and division, all fac- 
torials and all gamma functions. We provide 
herewith a brief table of this sort. 
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TABLE XIV A 

BASIC FACTORIALS AND GAMMA FUNCTIONS 


X 

x ! , or 

X 

x ! , or 


BH 

X 

x ! , or 


r ( x + i ) 


rcx+ 1) 


m 


r(x+ 1) 

“-OI 

1 . 00587 







.00 

1 . 00000 

.25 

.90640 

.50 

.88623 

.75 

.91906 

.01 

.99433 

.26 

.90440 

.51 

.88659 

.76 

.92137 

.02 

.988 8 4 

.27 

.90250 

.52 

.88704 

.77 

.92376 

.03 

.98355 

.28 

.90072 

.53 

.88757 

.78 

.92623 

- 04 - 

.97844 

.29 

.89904 

.54 

.88818 

.79 

.92877 

. 05 

.97350 

.30 

.89747 

.55 

.88887 

.80 

.93138 

.OS 

.96874 

.31 

.89600 

.56 

.88964 

.81 

.93408 

.07 

.96415 

.32 

.89464 

.57 

.89049 

.82 

.93685 

.08 

.95973 

.33 

.89338 

.58 

.89142 

.83 

.93969 

.09 

.95546 

. 34 

.89222 

.59 

.89 243 

.84 

.94261 

.1 0 

.95135 

.35 

.891 15 

.50 

.89352 

.35 

.94561 

. 1 1 

.94740 

.36 

.89018 

. 6 ! 

.89468 

.86 

.94859 

. 12 

.94359 

.37 

.8893 1 

.52 

.89592 

.87 

.95 184 

. 13 

.93993 

.38 

.88854 

.63 

.89724 

.88 

. 95507 

. 14 

.93642 

.39 

.88785 

.54 

.89864 

.89 

.95838 

• ! b 

.93304 

.40 

.88726 

.65 

.90012 

.90 

.93 176 

• 15 

.92980 

.41 

.88676 

.65 

.90167 

.91 

.95523 

. 17 

.92670 

.42 

.88536 

.67 

.90330 

.92 

. 95 877 

. 18 

. 923 73 

.43 

.88604 

.58 

.90500 

.93 

.97240 

. 19 

.92089 

.44 

.88581 

.69 

.90678 

.94 

.97610 

.20 

.91817 

.45 

.88566 

.70 

.90864 

. 95 

.97988 

.21 

.91558 

.46 

.88550 

.71 

.91 057 

.96 

.98374 

.22 

.913 II 

.47 

.88553 

.72 

.91 258 

.97 

.98756 

.23 

.91075 

.48 

.83575 

.73 

.91467 

.98 

.99171 

.24 

.90352 

.49 

.88595 

.74 

.91683 

.99 

.99581 







1.00 

1.00000 







1.01 

1 .00427 


.00002 1 


.00001 


.00001 


.00001 

E n 1 

.00000 j 


.00000 


.00000 


.00000 
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The E 2 values recorded at the bottom of the 
columns give the maximal 2-point , or linear, 
interpolation error as determined for entries 
half way down the column. Similarly the maximal 
3-point, or quadric, interpolation error is 
given . 

For gammas and factorials of small numbers 
the reduction formulas [14:45] and [14:46] may 
first be employed and then the necessary values 
found in Table XIV A. For large numbers, x > 4, 
Stirling's formula 

ln(x\) - fx-A) ■in x ~ x + ^- -l n (2 tt) + - 

2 2 12x 

[14:483 

1 1 1 

360x 3 1260x 5 1680x 7 

and Forsyth's If : “ \ x 

/- + x + x 2 

r(x+l) = / 2rr p / (14:49] 

e 

are highly reliable. If n is large the error in 
Forsyth's formula is less than 1/ ( 240x 3 ) of the 
whole. (See Pearson 1895 and 1901). Even for 
x ~ 1.5 the error is only in the neighborhood of 
1 per cent. (See Kelley 1923 p. 136). 

A function frequently needed in curve fitting 
is P fx+l)/x*e“ x . The logarithm of this to the 
base 10 as given by Pearson (1934) is 

r 6x4-1) 1 

log .3990899 + — log x 

x x e‘ x 2 [14:50] 


+ .080929 sin 


25 ?623 


[14: 52 J 
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J'/ie incomplete gamma fund ion i If the upper 
limit of the integral [ 14 : 44 ] is changed from oo 
to y, we have an "incomplete gamma function, " 
which has been tabled by Pearson . This is impor- 
tant in that this is the probability integral of 
a Pearson Type III distribution, — the equation 
being written with the origin at the mode. The 
Type III distribution has been found to be an 
excellent approximation to much biological and 
physical phenomena. One illustration is the dis- 
tribution of yearly maximum flow of water in a 
river. 

SECTION 7. THE NUMERICAL SOLUTION OF HIGHER DEGREE 
PARABOLIC AND OF EXPONENTIAL EQUATIONS 

So many procedures have been employed that the 
student may expect to find in the mathematical 
literature (see Scarborough, 1930, Ch. IX and 
Whittaker and Robinson, 1924, Ch. VI) a method 
adapted to his particular problem. Only two are 
given here, the first being serviceable in the 
case of a rapidly convergent power series and the 
second when the function can be expressed as the' 
sum or the product of two functions, each , suf* 
ficiently simple to permit of rapid plotting. 

If 

y -a + hx +cx 2 + dx 2 + ex 4 + negligible terms [14:51] 

We letx x be a first estimate of the correct value. 
Substitute x x in all terms of [14:51] beyond the 
x term and solve for x t calling the value ob- 
tained x 2 , a second estimate. Substitute x 2 in 
terms beyond the x term, solve, and call the value 
obtained x^ f a third estimate. Continue the 
iterative process until two successive values, 
say and x 4 , differ by a small, but not quite 
negligible amount. Compute y ^ and y 4 : 

y^ ~ a + bx^ + cx| + dx 3 T exjjj . 


* * • [14:52] 
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y = a + hx 4 + cx q 2 4- dx^ + ex’J .... [14:53] 

Then an interpolated value, x f from the following 
table in which all values except x are known, is 
the required answer. 



Generally, if y is not intermediate between y^ 
and y 4 , further iterations are desirable before 
interpolating to obtain the final value of x. 

If f(x) - 0 is to be solved for x, it will 
frequently be possible to write 

f(x) = cp(x) - ijj(x) =0 [14:54] 

in which 0(x) and l|j(x) are not di f ficul t to plot . 
The intersection, or intersections, of the two 
curves give values of x for which f (x) = 0 . After 
a point of intersection has been approximately 
located from the graphs, select two values of x x 
and x 2 upon opposite sides of this point and as 
close to it as possible and compute <£(x x ), c^(x 2 ), 
ljj(x 1 ) , and ljj(x 2 ). We desire a value x, between 
x 1 and x 2 such that <p(x) = t T (x). If the points 
x ± and x 2 are sufficiently close, the curves in 
this region may be represented by two intersecting 
straight lines, and from similar triangles it is 
readily derived that 

( X 2 ~X , ) [0(X ) “fix )] 

X = 3 +x, [14:55] 

-0(x 2 ) +0(x x ) -'llj(x i ) +i)[x 2 ) 1 

A substitution of x into <pix) and tl(x) will dis- 
close the accuracy of the solution. 

If f(x) can be written as 

f(x) = 0(x)llJ(x) .... [14:56] 
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we can employ logarithms, defining two new func- 
tions 

P(x) = log d>(x) - log f(x) [14:57] 

P(x) ““log f(x) 

which are related as in [14:54], 

P(x) - Q (x) -0 

Solving as before we obtain x which satisfies 
[14:56] . 


THE NUMERICAL SOLUTION 
OF COMPLICATED SIMULTANEOUS EQUATIONS 

The numerical solution of simultaneous equa- 
tions which cannot readily be written in the form 
y = f(x), can frequently be rapidly accomplished 
by iterative processes. If two equations are 
functions of x and y , they can generally be 
written in the following form in a number of ways: 

x = /(x,y) ...... [14:58] 

y - F (*,y) . . - « . . [14:59] 

An iterative process upon these equations may, or 
may not, converge in the neighborhood of some so- 
lution, x Q y Q . Graphs of [14:58] and [14:59] re- 
veal, let us say, that x = x ± and y ~ y 1 is an 
approximate solution. A second approximation is 
x 2 ,y 2 , given by 

x 2 = f(x 1 , y x ) [14:60] 

y 2 = F{x 2 y 1 ) [14:61] 

A third approximation is x Q ,y ^ . 

* 3 ~ f(x 2 , y 2 ) • • • • • [14: 62] 

y 3 ~ F X 3 y 2 [14:63] 

This process is to be continued until two sue- 
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[14:64] 


cessive sets of values differ negligibly. If 
convergence is not present, a different segrega- 
tion of terms may be made so that we have x = 
0(x,y) and y = l[](x,y) in lieu of [14:58] and 
[14:59]. Upon successive i te rat ion these may lead 
to convergence. For iteration in the case of two 
such functions, say [14:58] and [14:59], to be 
convergent it is necessary that 


and 


3/ 


3F 

< 1 . . . . . [14:64] 

3x 

T- 


df 


3F 


By 

+ 

i 


< 1 • • . • [14:65] 


in the neighborhood of x Q y 0 . This test for con- 
vergence can be made prior to iteration, or 
omitted if one prefers to discover the outcome by 
trial. 


section 8. TRANSFORMING RANK AND PERCENTAGE POSITIONS 
INTO QUANTITATIVE SCORES 

Normalizing a distribution: In many situa- 
tions in which a rank order, a percentage, or 
proportion position is given it is desired to 
treat the data quantitatively. In general, if 
p 2 , and P 3 are three proportions, the numer- 
ical values oi the differences (p i "'p 2 ) and(p 2 -p ? ) 
are an inaccurate quantitative statement of the 
underlying variables which yielded the three pro- 
portions. For example, if in a given trait, 
individual A exceeds p 1 of the members of his 
group, individual B exceeds p 2 and individual C 
exceeds P 3 , and if the distribution of talent in 
the group is normal, the correct relationship is 
not (p 1 -p 2 ) to (p 2 “P 3 ), but (x 1 “X 2 ) to (x 2 -x ? ) 
in which the x' s are the deviates in a normal 
distribution corresponding to the p’s. If, for 
each p, an x is abstracted from a table of a unit 
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normal distribution, the variable thus obtained 
will have many properties which fit it for quan- 
titative study. Thi s process is called norm- 
alizing a distribution. There obviously should 
be some plausibility in the assumption of near 
normality of the underlying trait before it is 
done. Even so, the x-variables resulting do not 
have all the characteristics of the quantitative 
measure in the case of a normally distributed 
trait. 

If a test is equally reliable throughout a 
certain range, scores received by two individuals 
taking the test, whether low, intermediate, or 
high, have the same standard error which approp- 
riately is called the standard error of measure- 
ment. In general, two proportions, p 1 and p 2 , 
have unequal sampling errors, for, as given by 

[13:107], these standard errors are /p^q^/W and 
vp^q^/N respectively. Also the equivalent x ± and 
x 2 deviates have unequal sampling errors, for as 
derived from [4:03] , these standard errors are 
and Jp 2 q 2 /Nz* respectively. Though 

the normalizing of ranks, or a series of propor- 
tions, or of percentages is known frequently to 
be very useful, nevertheless a transformation of 
such ranks, proportions, or percentages into 
scores which are equally reliable will clearly 
have value in cases where a single concept of the 
reliability of a score due to sampling is crucial 
in interpretation. 

Transforming rank or percentage position into 
equally reliable deviation scores: Let us con- 
sider a transformation of p into x, such that cr % 
(the standard error of each x, not the standard 
deviation of the distribution of the x’s) is 
independent of the value of p, — that is, cr x = 

■Jc/n, in which c is a constant and N the size of 
the sample yielding p . (An illustration follows 
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in which N is the number of items comprising a 
test. ) The variable p is the obser ved v alue 
whose standard error is known, namely Spq/N~ t and 
the standard error which attaches to x is to be 
derived from that which attaches to p. Let 


x = ftp) 


dx = /' (p )dp, 
and immediately 

Cr x = /'(P)0r p = 




Integrating 

x = (2 sin~^ /p -~) Sc+k 


[14:66] 

[14:67] 

[14:68] 

[14:69] 

[14:70] 


If k, the constant of integration, be set equal 
to 0, and c set equal to 1, we have 

x = 2 sin _1 /p [14:71] 

Then x takes values from “tt/2 to rr/2 as p takes 
values from 0 to 1, and cr %9 no matter the value 
of x, is a function of the size of the sample 
only and is equal to 1 /JfT m This <r x is the stan- 
dard error of each x and not the standard de- 
viation of the distribution of x’s, which takes 
a range of values depending upon the distribution 
of the p's. 

In certain problems it is desirable to make 
an analysis of the variance of a set of inde- 
pendent proportions, but the proportions, based 
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upon the same numbers of cases, are different 
and accordingly unequally reliable. If each 
proportion is transformed into an x by [14:71], 
the resulting measures are independent and e- 
qually reliable so that their further study by 
analysis of variance methods is simple and ac- 
curate. 

As an illustration of a problem which might 
well use this transformation, consider a funda- 
mental operations in arithmetic test, consisting 
of N problems, of substantially equal difficulty, 
administered without time limit and scored for 
speed of completion and again for accuracy of 
computation. The accuracy score is p, the pro- 
portion of answers that are correct and as ex- 
plained it has unequal reliability from subject 
to subject. If for each p we employ [14:71] or 
Table XIV B to obtain x, we now have a score with 
a constant standard error 1/V/T, This happy out- 
come has been accomplished at a certain expense. 
The distribution of the x* s is presumably not a 
correct representation of the true distribution 
of underlying arithmetical computation ability. 
Specifically, had this true distribution been 
normal neither the distribution of p’s nor that 
of the x s s would be normal. How serious this is 
becomes an appropriate topic for investigation 
in each particular instance. 

The [14:71] transformation may be useful when 
the original data consists of ranks, or equiva- 
lent percentages, or proportions. If /V subjects 
are ranked in an order of merit, the proportion 
of cases falling below a person is his p-score. 
In computing this the proportion represented by 
the person himself is split into halves, one-half 
being added to the proportion below him and one- 
half to the proportion above him; thus the person 
ranking lowest has a p-score = . 5//V; the second 
lowest a p- 1.5/.V; the third lowest a p = 2.5/W; 
etc. These p-scores form a rectangular distri- 


596 


STATISTICAL INGENUITY 


[14.72] 


TABLE XIV B 


(negative 


(negat i ve 



(negat i 

ve 

for p < .5 


for p < . 

5 

for p < 

.5 

and positive 

and 

positive 

and posit 

i ve 

f< 

)r p > .5 

) 

for p > . 

5 ) 

for p > 

. 5 ) 

P 

X 

P 

P 

X 

P 

P 

X 

P 

.00 

1.5708 

1.00 

. 17 

.7208 

.83 

. 34 

.3257 

.66 

.01 

1.3705 

.99 

. 18 

.6945 

.82 

.35 

. 3047 

.65 

.02 

1.2870 

.98 

. 19 

. 6687 

.81 

.36 

.2838 

.64 

.03 

1.2226 

.97 

.20 

. 6435 

.80 

.37 

.2631 

. 63 

.04 

1 . 1681 

.96 

.21 

. 6187 

.79 

.38 

.2424 

. 62 

.05 

1 . 1 198 

.95 

.22 

. 5944 

.78 

.39 

.2218 

. 6 ! 

.06 

1.0759 

.94 

.23 

.5704 

.77 

.40 

.2014 

. 60 

.07 

1.0353 

.93 

.24 

. 5469 

.76 

.41 

. 1810 

.59 

.08 

.9973 

.92 

.25 


.75 

.42 

. 1607 

. 58 

.09 

.9614 

.91 

.26 


.74 

.43 

. 1405 

.57 

. 10 

.9273 

.90 

.27 

.4780 

.73 

.44 

. 1203 

.56 

. II 

.8947 

.89 

.28 

.4556 

.72 

.45 

. 1002 

• 55 

. 12 


.88 

.29 

.4334 

.71 

.46 

.0801 

.54 

. 13 

.8331 

.87 

.30 

.4115 

.70 

.47 

.0600 

.53 

. 14 

.8038 

.86 

.31 

. 3898 

.69 

.48 

.0400 

.52 

. 15 

.7754 

.85 

.32 

. 3683 

.68 

.49 

.0200 

. 5 ! 

. 16 

.7478 

.84 

.33 

.3469 

.67 

.50 

.0000 

.50 


bution which we may treat as continuous if iV is 
at all substantial. Whereas in the arithmetic 
fundamentals test the p's may yield a unimodal 
distribution covering a small range, in the pres- 
ent case the p's extend uniformly from p .= 0 to 
P ” 1. We now know the form of the distribution 
of p and if the [14: 7 1 j transformation is made 
we can determine the form of distribution of x, 
The equation of the distribution of p, so wri tten 
that the total area is 1 , is 

z p = 1 (from p= 0 to p= 1) 


[14:72] 
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dx 


dp 


Jp- 


[14:73] 


P~P 


Let the number of cases in the interval dx be / 

and the number in the interval dp be f - 

p 

f 

X 

z \ The ordinate in the X-d i st ri bu t i on [14:74] 


Z p ~ ^ ~ 1 The ord ? nate I n the p-d I $t r I but I on[ 14; 75] 

The element of area z^dx equals the corresponding 
element of area z p dp, that is z^dx - dp . Utili- 
zing [14:73] we have 

Z x = /p-p 2 [14:76] 

Obtaining p from [14:71] and substituting we get 


. -,X 7 T X 77 

z « = sin( 2 +_ 4 cos(-+— ) ... 


The sine-cosine distribution " 7 p~ ^ X .^“Tr Cl 4 T 77 ] 


This is a symmetrical distribution, with limits 
in x of ~7t/2 and 77/2, of total area 1, having a 
mean of zero, a variance of ( 77^-8 )/ 4, a Pearson 
f3 ± of 0, a Pearson j3 2 of 2.19375 

77 14 - 4877 2 + 384^ 

7T 4 - 1677 2 + 64 
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an area between any two values as given by 
[14:78], and a standard error of every x, i f de~ 
rived from a p with standard error of /pp/V, 
(which, presumably will seldom be the case), of 
l//f7 The area between x 1 and x 2 is given by 



z dx 

X 







— < x <” [14:78] 

2 = =2 


If a table of sines for circular arguments is 
not available, x, may be expressed in terms of 
degrees employing the relationship 

' * radians degrees, or 


57.29577 951x ~ corresponding number 

of degrees [14:79] 

.01745 32925 (degrees ) = correspond- 
ing number of radians [14:80] 


section 9. THE SQUARE ROOT TRANSFORMATION 
(See Bartlett 1936) 

In the situation wherein independent variables 
are normally distributed a sum (or weighted sum 
by introducing weights), as given by [14:81], 
(see Chapter VIII , Section 10). 

X = X x + X 2 + • • • + X k . . . . F 14 : 8 1 J 

is distributed normally, and such an estimate as 
X/k is of the mean of the right hand member X’ s, 



[14:82] 


CUBE ROOT 


599 


is an efficient estimate, if these X’ s are equal- 
ly reliable. The analysis of variance proceeds 
via the equation, 

V* = v x + v 2 + . . . + v k . 

If the independent X’ s are Poisson distributed 
a more efficient prediction equa tion than one 
predicting X/k is one predicting VX/k, thus, if 

Y i = ^*7’ Y 2 = Sx7> • • • , y k = 

Y = Y x + Y 2 + . . . + F k 

V(Y) = V(Y X ) + V(Y 2 ) + ... + V(Y k ) 14:82] 

Though this is not a totally adequate transfor- 
mation* (See Cochran 1940) it has such merit as 
to suggest its use when the original variables 
are Poisson distributed and an analysis of vari- 
ance is desired. 

section io. CUBE ROOT TRANSFORMATION 

The Wi 1 son-Hi 1 f e rty transformation is a line- 
ar function of a cube root. It very nearly nor- 
malizes the X. 2 distribution. As employed in 
Chapter IX, Section 5 it is the basis for very 
nearly normalizing variance ratio distributions. 
As these distributions are to be found in all 
realms of social, biological, and physical sta- 
tistics the cube root transformation may be ex- 
pected to have wide utility. 

section li. CERTAIN PROPERTIES OF DIFFERENCES IN 
TABLED ENTRIES 

We employ the notation of Chapter XIII, Sec- 
tion 12. 

If in a table all t* s except t Q are correct 
and t Q is too large by the amoun t e, the errors 
in the differences take the pattern: 

Cochran refines his "/X w a I ues , by an adjustment which is 

small except in the case of an X that equals 0. 
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^6 + e 

___ 


A^+e A^-ee 

A ^3 X +6 A^ 4 -5 6 

A^2 +6 Af^-^e A^ 4 +i5£ 

Af x +€ A^ 1 -3e A^+ioe 

1 0 + £ A^-26 Af^ + 6€ A^-206 

Aq - € A^ I +3€ A^-IOC 

Aj x + e A^-ie A^ + isc 

__ ___ 

A* V +£ A^-6 6 


A 0 VI + £ 


It will be noticed that the largest error in the 
even A’ s is opposite the tabled entry which is 
in error. When an error in some t value is sus- 
pected it is frequently possible to locate it by 
thus studying the differences. 

The following formula, noted by Cosens (1944) 
is particularly useful to check an entry, t 0 , 
which one questions, by determining what this 
entry would be based on the six neighboring en- 
tries „ 

* 0 = -.30 (t_ z + t 2 ) + .05 (t_ 3 + t 3 ) 
-.05 A VI ^ 3 [14:83] 
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When the recorded t Q is presumably in error the 
value A t will , presumably, be greatly in er- 
ror. If neighboring portions of the table war- 
rant the belief that A VI is negligibly small, 
the A t term in [14:83] may be dropped. 

SECTION 12. EXPANDING A TABLE BY INTERPOLATING VALUES 

Let us be given a table with entries t for 
equally spaced arguments a and let it be desired 
to interpolate (n~l) additional values between a 
t value and a neighboring t value. For example, 
if the arguments run .00, .01, .02, . . . with 
corresponding tabled entries, it might be desired 
to expand the table so that the arguments run 
.000, .001, .002, . . . with corresponding tabled 
entries. The interval in the expanded table is 
1/10 that in the original table. In this case 
n = 10 and 9 additional values are to be interpo- 
lated between existing values. The tabled en- 
tries have, of necessity, an error which may be 
as large as 1/2 in the last decimal place re- 
corded, so these tabled entries should be accur- 
ate to one or more, preferably more, decimal 
places farther than is required in the interpola- 
ted values. 

Differences in the original table are computed 
to the first order that becomes negligibly small. 
We will assume this to be the sixth order, so 
that A VI ~ 0, and illustrate the solution for 
this case. If A 1 s represent the differences in 
the original table and S' s in the expanded table, 
the relationships between them are given here- 
with. For simplicity the subscript 0, or any 
other subscript which remains constant, which 
attaches to every S and A, has been omitted. 

gvx = , = 0 . . . [14:84] 
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[14:85] 


____ A 137 A T 

gtl = 2(n~l) 

5 


[14:86] 


n 


n 


S IU ‘ 


Aixr 


3(n~l) A— (n-l)(7n~5) A^ 


[14:87] 


2 n 


n' 


i i - 


A 


XX 


" (n~l) 


A xn. ( n -i)(i ln -7) A 3 ^ 


rv 


12 


n 


(n-l)(2n“l)(5n-3) A 1 * 


12 


[14:88] 


n J 


, A x _(n-1) A 3 - 11 (n-l)(2n-l) A 131 

S 1 + * 

n 1 2 n 2 3! n 3 


(n-l)(2n-l)(3n-l ) A 
4! 


XL 


(n-1) (2n-l) (3n-l) (4n-l) A 2 
5! n 5 


.[14:89] 


Having the S’s, the interpolated values are 
readily obtained. A ready check for accuracy is 
available by employing overlapping regions and 
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computing certain of the interpolated values 
twice. The preceding formulas serve by dropping 
off the higher order terms if some A of lower 
order than A approximates zero. If it is a 
higher order than A v * that approximates zero, 
similar formulas to the preceding can be derived. 


section 13. TRIGONOMETRIC FUNCTIONS OF SUMS AND DIFFERENCES 
Given sin, cos, and tan of 9, 


. 0 I 1 0 f j 

sln 2"J~2^~ cos ^ • COS 2 = J 2 ^ + cos(9 ) ; 


tan 


sing 
1 + cosg 


[14:90] 


sin(20) - 2sin# cos 6; cos (2 6) - cos 2 6 - sin 2 6 [14: 91] 


tan(2$) - 


tan9 
tan 2 i 9 


[•14:92] 


Given sin, cos, and tan 


u a w cun 


sin(g±0) 
COS (g±0) 

tan(g±^) 


sing cos0 ± cosg sin<£ 
cos6 co s<p ¥ sing sin0 
tang ± tan 0 


. [14:93] 
. [14:94] 


1 ^ tang tan0 


[14:95] 


SECTION 14. space of two dimensions 

A few properties of a straight line . There 
are various forms for the equation. 

ax *f by + C — 0 The genera! form [14:96] 

In which a is the Intercept [14:97] 

on the y axis and b the slope 
of the line. 


y - a + b x 
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In which 3 and h are the 
intercepts on the V and 
X axes respectively 


[ 14 : 98 ] 


y ~ y i X - X x In which (x^y^) and 

- (x 2 > y 2 ) are two points [ . 99] 

y 2 ~ y 1 ^2"** ^ 1 thrpugh which the line 

passes. 

The perpendicular distance of the point (x lf y ± ) 
from the line [14:96] is 


ax 1 + by 1 + c 
/a 2 + 6 2 


In wh i ch the radical 

takes the sign op- 1*1 fin! 

posite to that of C. luUJ 


Rotation of axes in two dimensional space: 
Let x x and x 2 be the original variables and y x 
and y 2 the variables after rotation* If the 
axes rotate counter clock-wise through the angle 
6 , the transforming equations are 


yi = x i 

cos 

e 

+ 

x 2 

sin 

8 • • 

• • -[14:101] 

y 2 = “ x i 

sin 

e 

+ 

x 2 

cos 

9 . . 

. . . [14: 102] 

The reverse 
lationship, 

equations , 
are 

expressing 

the same re- 

x i = yi 

cos 

e 

- 

y 2 

sin 

6 . . 

. . [14:101a] 

x 2 = y x 

sin 

e 

+ 

y 2 

cos 

e . . 

. . [14:102a] 


Correlation functions of rotated variables: 
Let x x , x 2 , and x^ be three correlated variables 
with means of zero, variances of V x , V 2 , and , 
and covariances c 12 > c 13 9 aR d c 2 ^ m Let x^ re- 
main unchanged, but x 1 and x 2 transformed into 
y x and y 2 by the rotation given by [14: 101 J and 
[14:102]. Then 
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V(Yi) V 1 cos 2 9 + V 2 sin 2 6 + 2 c 12 sin# cos#Q 4 . 203 J 

V(y 2 ) = ^sin 2 # + V 2 cos 2 9 - c 12 sin 0 cos#[ 14;104 ] 

V(y ± ) + V(y 2 ) = v J + V 2 [14:105] 

c ( , y 1 y 2 0 = c i 2 ( cos2 ^ “ sin 2 #) 

“ (V x~V 2 ) sin# cos 6 [14:106] 

c ( y i x 3 ) = c 13 cos# + c 23 sin# [14:107] 

c (y 2 x 3 ) ~ **c 13 sin# + c 23 cos# [14:108] 

[cry a x 3 ;] 2 + [c(y 2 x 3 J>] 2 - c * 3 + c 2 23 [14:109] 

If the rotation in the x x , x 2 plane is to 
major and minor axes, or such as to make V(y 1 ) a 
maximum, it of necessity at the same time makes 
V(y 2 ) a minimum and the covariance, c( y 1 y 2 ) > 
zero* In this case the angle # is given by 

12 

tan( 2(9) = — — [14:110] 

v f v 2 

Tables to facilitate such rotations, giving 
sin#, cos#, sin 2 #, cos 2 #, sin# cos#, and 2 sin # 
cos# for argument # or argument tan( 2 #) are 
given in Kelley (1935). 

SECTION 15 . SPACE OF THREE OR MORE DIMENSIONS 

The general form of the equation of a plane 
in three-dimensional space is 

ax + by 4- cz + cf = 0 ....... [14:111] 

The intercept form in which a, b, and c are the 
intercepts on the x, y, and x axes, is 


60 6 


STATISTICAL INGENUITY 


[14:112] 


x y z 

— +■—+—= 1 [14:112] 

i 3 fa C 

Hie distance of the plane [14:111] from the point 
(x^y^Zy) is 

ax. + by. + cz 1 + d 

d = - . [14:113] 

/ a 2 + b 2 + c 2 

in which the sign of the radical is opposite to 
that of d. 

Two non-parallel planes intersect in a straight 
line. Accordingly the equations of two planes 
define a line. 

Three planes, in general, intersect in a point 
and the equations of three planes define a point. 

For the angle between two lines, we let the 
lines be defined by 


x-x 1 _ y-y lmm z~z : 


and 


y~y 2 = z-z 2 
a 2 b 2 c 2 


then 


[ 14:1143 


[ 14 : 115 ] 


a^2 i 12 12 

COS e = 7 . .[ 14 : 116 ] 

/a 2 + b 2 + c 2 /a 2 +b 2 + c 2 

in which 8 is the angle between the two lines. 

The sine of the angle between the straight 
line [ 14 : 114 ] and the plane [ 14 : 111] is 
a ± a + b ± b + c 1 c 

7===:" r= r. . .[14:117] 

i/a 2 + k^+c 2 /a 2 +b 2 +c 2 


sin 8 = 


[14:119] LA GRANGE MULTIPLIERS 

section 16. LAGRANGE MULTIPLIERS 
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These multipliers provide a method for finding 
the maximum or minimum of a function when one or 
more conditions are imposed upon the variables 
in the function. The general statement is as 
fo Hows : 

Let f be the function to be maximized, or 
minimized. Let 0; . . . 0 k ~O; be the condi - 
tions imposed upon the variables. f , 4> x , <p 2 , 


Wri te 


0 k are functions of x ±f x. 


t - f + \ 1 0 1 + x. 2 ^ 2 


[14: 118] 


‘• 2^2 ' • • • + 

Obtain the n partial derivates and set equal to 
zero. 

B (j 

- - = 0 


dx 


l jj 

dx n 


. • • [14:119] 


B t] 

~= 0 


These n equations, together with thek equations, 
<t> 2 = 0, . . . , 0 k =o provide a set of (n + k) 
equations from which the X’ s may be eliminated 
and the maximizing, or minimizing, values of the 
x’ s obtained. ^ 

To illustrate: Let f =~xy = the area of a 

triangle with base x and altitude y, and let it 
be further imposed that x + y 2 = 9. Required to 
find the dimensions yielding a triangle with 
imum area. We have 


max- 
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/ = — xy [a] 

2 

4> x = x + y 2 -9 = 0 [b] 


l[j = — xy + N/x+y 2 - 9) 

Zd 


[c] 


4 1 n 

— =— y + N a - 0 

dx 2 1 


[d] 



— x+- 2A., y - 0 
2 1 


[e] 


Solving [b] , [d] , and [e] simultaneously we ob- 

tain x = 6 and y - Vi for the required dimen- 
sions of the triangle. 

SECTION 17. GROWTH CURVES 

a,, b, and c are constants and x is time, or 
age. The law of organic growth 

y = a e b * [14:120] 

Makehams force of mortality curve 

y = c + a e b * [14: 121] 

The simple logistic growth curve 

y= [14:122] 

1 + be ~ c * 

The Gomperz growth curve 

y = a bX [14:123] 

A growth senescence curve 
a 

y 

e- bx +e ex 


[14:124] 
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section 18. THE BINOMIAL THEOREM 

(a+b) n = a" + na r '~ 1 b + a n ~ 2 b 2 

2 ! 


+ ” ( *: 1)(n ~ 2) a n-3 b 3 + . . 
3! 


_n(n-l)(n-2) ... (n-n+1) n [14;]25l 
n ! 


The expansion of the polynomial (a+b+c+ . . . 
+*O n consists of terms the sum of whose exponents 
is always equal to n. If for some term the ex- 
ponent of a is a, of h, /3, of c, y, , , , of k, /<, 
where of course a + /3 + y + . . . + k * n , this 
term is given by 


n\ 


^ C y . . . k K . . . [14:126] 


a! B\ y! ... /c ! 
section 19. STIRLING’S APPROXIMATION TO THE FACTORIAL 


Tn(/V!) - (N +■—) 'Cn(^) +-^Tn(27r) - .V 




2! 4! S 3 


2! V 4! /V 3 6! V 5 

The 8's are Bernoullian numbers. 

section 20. THE SUM OF THE POSITIVE POWERS OF THE 
FIRST/c NUMBERS 

/c p + 1 k p pA p - 1 S 1 

IP + 2 P + . • • + fc p= + — + 

o + l 2 2! 


[14:127] 


pC p~l)(p~2)k p - 38. f/p-lXp-2Xp-3)(o-4)^ p ~ 5 fi 


[14:128] 


3_ 


4! 


6! 
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[14: 129] 
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the series terminating with the term in which the 
exponent of k is 1 or 2. The S’ s are Bernoulli an 
numbers. 

section 21. FOURIER SERIES 

y - a 0 + a 1 sin x + fences x + a 2 sin 2x 

+ b 2 cos 2x + . . • [14:129] 

This is important in fitting curves to data 
showing periodicity. This is treated in E. T. 
Whittaker and G. Robinson, THE CALCULUS OF OBSER- 
VATIONS , (1924) Chapter X. 

SECTION 22. TAYLOR 1 S THEOREM 

f(x) may be expanded in the neighborhood of 
x - a thus 

(x~a ) 2 

f(x) - i(a) + (x~a)f'(a) + — — f' (a) +. . . 


I "' 1 (a) + R 

(n-1) ! [14:130] 

in which R is the remainder after n terms, and 
the successive /-primes are the successive der- 
ivates of the function evaluated for x - 0. If 
the limit of S is zero as n 00 we have a Taylor’s 
series, and this is a convergent series. If a=0 
in a Taylor’s series of one variable we have a 
Waclaurin series. Taylor’s theorem also applies, 
with certain necessary modifications, to func- 
tions of more than one variable. 

section 23 . THE EULER-MACLAUR 1 N FORMULA FOR 
EVALUATING A DEFINITE INTEGRAL 

Let k be the number of intervals of uniform 
base in the region from a to a + ki . Then the 
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integral of f(x) dx is given by 

1 a +k I i 

~ f f(x)dx =~f(a) + f( a+i) 

1 a ^ 

+ f(a+ 2i) + ... + f(a+k- 1 i ) 

+ ~f(a+ki) [ f'(a+ki)-f'(a )] 

+ L [/" , (a+ki)-f" '(a)] 

4! 

— - [/ 5 ('a+/ci)-f 5 ('a0] + etc. 

6 ! 

[14:131] 

in which the B * s are Bernoullian numbers. 



CHAPTER XY 


STATISTICAL TABLES 

section l. SELECTED REFERENCES 

The tables given in later sections are likely 
to be too abbreviated for refined statistical 
work. They should prove adequate for training 
purposes and for preliminary work. Much more 
extensive tables of the sorts here given are 
available in The Kelley Statistical Tables , in 
press, 1947, 

The accompanying very limited and select ref- 
erences indicate the sources of what are deemed 
to comprise certain tables important to the sta- 
tistician, The arrangement is chronological, 
except where several tables appear under a single 
institutional authorship. 

I8U. BARLOW’S TABLES. 1814 and later. Edi- 
tion of 1930 here referred to. [Generally 8 
significant figures in tabled values, with an 
argument of 4 figures] "Squares, cubes, square 
roots, cube roots, and reciprocals of all integer 
numbers up to 10,000." The square root of N and 
of 10N are given. The cube root of N is given, 
but not that of 10N nor that of 100Y. A table 
of powers up to the tenth of all integers from 1 
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to 100, and of powers up to the 20th of all in- 
tegers from 1 to 10, is given. 

1910. B. 0. Peirce. A SHORT TABLE OF INTE- 
GRALS. Second edition, 1910,- 144 pages. This 
is a handy and concise source for 938 formulas: 
integrals, auxiliary trigonometric, hyperbolic, 
elliptic functions, Bessel's functions, series, 
products, derivatives, etc. Also contains brief 
tables of the probability integral, elliptic in- 
tegrals, hyperbolic functions, natural and com- 
mon logarithms, and trigonometric functions. 
Third edition revised by W. F. Osgood, 1929,-156 
pages. 

I9U. Karl Pearson, editor. TABLES FOR STA- 
TISTICIANS AND BIOMETRIC SANS, Part i. (See also 
Part II, 1931; Cambridge University Press, Tracts 
for Computers, 1919-37; and, Biometrika Publica- 
tions, 1934-38). 

Table II (W. F. Sneppard) [7 place consequent] 
Area and ordinate of the normal curve in terms 
of the abscissa . Interval is ,01 cr. (1/2 (1+a) 
- p and z = z of Table XV C herein.) 

Table XII (W. Palin Elderton) [6 places] 
Tables for testing goodness of fit . These are 
X 2 tables. The interval in is very coarse, 
being 1. It is important to note that n* as 
used in these tables is one greater than d. o. f. 
The n* arguments are 3(1) 30. 

Table XXVI (W. Palin Elderton) Tables to 
assist the calculation or the ordinates of [a 
type III] curve. 

Table XXVII (W. Palin Elderton) Tables of 
powers of natural numbers 1 to 100 . Also Table 
XXVIII Sums of powers of natural numbers 1 to 
100 . 

Table XXX (P. F. Ever it t) [4 place] Tables to 
facilitate the determination of tetrachoric cor- 
relation. r =.80, 85, 90, 95, and 1.00. For 

more complete tabulations see Pearson 3 s Tables , 
Pt. II, Tables VIII and IX. 
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Table XXXI ( J . H. Duff ell) [7 places] Loga- 
rithms of the gamma function . . . from p - 1 to 
P = 2 [interval . Oil . 

XXXV A Diagram to determine the type of a 
frequency distribution from a knowledge of the 
constants /? x and 

Tables XXXVII to XLVII (A. J . Rhind) Probable 
errors of frequency constants connected with the 
Pearson system of curves. 

Table XL IX (Julia Bell) [Beginning- with 7 
place and ending with 11 place] Logarithm of 
factorial n from n = 1 to 1000. 

Tables LI and LII concern Poisson distribu- 
tions . 

Table L IV (Alice Lee) Tables of the G( r , v) 
integrals , G(r,v) = sin r F e v 6 d6i Auxiliary 
functions leading to this integral are tabled 
for arguments r = 1 (1) 50 and cf° - 1 ( 1 ) 45, — 0 
being defined by tan <p ~ v/r. 

1919-1939. Cambridge University Press Tracts 
for Computers, Karl Pearson, Editor. Department 
of Applied Statistics, University of London. 

No. 1, 1919. (Eleanor P airman) Tables of the 
Digamma and Trigamma Functions . The digamma 
function is defined F (z) = d/ dz F (1+z), and 
the trigamma function is defined 

d 2 

F (z) -Ln T (1+z) 

dz 2 

F and F are tabled for argument x ~ . 00 (.02) 
20.00. 

No. 4 , 1921. (Originally computed by A . M . 
Legendre) Tables of the Logarithms of the Com- 
plete Gamma Function to Twelve Figures. Log F 
a is tabled. Argument a = 1,000 (.001) 2.000. 

No. 8, 1922. (Egon S. Pearson) Table of the 
Logarithms of the Complete Gamma Function (for 
arguments 2 to 1200, i . e. , beyond Legendre's 
range). Log F ( p ) is tabled to 10 decimal 
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places. Argument p- 2.0 (.1 ) 5.0 (.2) 70. (1) 1200. 

No. 9, 1923 . (John Brownlee ) LogP( x) from x=l 
to 50.9 by intervals of .01. [7 decimal places]. 

No. 13, 1926. (James Henderson) Bihl iotneca 
T abul arum fii a them a tic arum , being a descriptive 
catalog of mathematical tables. Part I, Loga- 
rithmic Tables. 

Alexander John Thompson . Logari thmetica 3ri- 
tannica, being a standard table of logarithms to 
twenty decimal places. No. 11, numbers 90,000 
to 100,000 [1924]; no. 14, numbers 80,000 to 

90.000 [1927]; no. 16, numbers 40,000 to 50,000 
[1928]; no. 17, numbers 50,000 to 60,000 [1931]; 
no. 18, numbers 60,000 to 70,000 [1923]; no. 19, 
numbers 10,000 to 20,000 [1934]; no. 20, numbers 

70.000 to 80,000 [1935]; no. 21, numbers 30,000 
to 40,000 [1937]. The part for numbers 20,000 to 

30.000 announced in 1945 as "at press." 

No. 15, 1927. (L. H. C. Tippett) Random Sam- 

pling Numbers. 10,400 four- figure numbers. 

No. 23, 1938. (L. J. Comrie) Tables of tan ~ 1 

x and log (l+x 2 ). 

No. 24, 1939. (M. G. Kendall and 3. Babington 
Smith) Random Sampling Numbers, — Second Series. 

100.000 digits grouped in twos and fours and in 
100 separate thousands. 

1923. James W. Glover. TABLES OF APPLIED MATHE- 
MATICS IN FINANCE, INSURANCE, STATISTICS. Part I 
"Values of compound interest functions to 8 places 
of decimals and 7 place logarithms of values of 
compound interest functions. " Part II "Values of 
life insurance and disability insurance functions." 
Part III "Values of probability and statistical 
functions." Part IV "Common logarithms of numbers 
from 1 to 100,000 to 7 places of decimals." 

1930. L. R. Salvosa. Tables of Pearson type 
III functions and Derivatives of Pearson type 
III curve. Annals of Math. Stat*, Vol. 1, 1930 
[six decimal places]. 

Table I. Areas. Argument, t, is a standard 
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score deviate. - ~4. 99 (. 01 ) 9.99. Skewness 
argument, ou is .0 (. 1 ) 1 . 1 . 

Table II. Ordinates. Argument t = - 5. 49 
(.01) 9.99. Skewness = 0 (.1) 1.1. 

Table III. First six derivatives. Argument 
t = -9.9 (.1) 14.9. Skewness a 3 = .0 (. 1 ) 1 . 1 . 

1931. Karl Pearson, Ed. TABLES FOR STATIS- 
TICIANS AND BIOMETRIC IANS, Part II. 

Table II (T. Kondo and E . #. Elderton) Abscis- 
sae, ordinates and ratios [ z/p , z/q, p/ z , q/z] 
to ten decimal places of the normal curve to each 
permille of frequency . 

Table III (Jonn P. Mills and d. T. Camp) [5 
places] Patio, area of tail to bounding ordinate, 
or [q/x] for each percent, of deviate . 

Tables V-VII ( Alice Lee) concern tetrachoric 
functions . 

Tables VIII -IX (Alice Lee , Margaret I foul, 
Ethel M. Elderton , 4. £. P. Church, E . C. Fieller, 
,/. Pretorius, and K . Pearson) [6 place] njr or 
determining the volume of any quadrant or of any 
cell of a bivariate normal frequency distribu- 
tion. 11 The interval in both arguments is .la*. 
A table for each of the following values of r is 
given: r - -.95 (.05) 1 . 00 . 

Tables XXI-XXII (L. H. C. Tippett) [7 and 6 
place] 11 Distribution of extreme individuals and 
of the range in samples from a normal population . " 
Table XXI gives distributions for X - 3, 5, 10, 
20, 30, 50, 100, (100) 1000. Table XXII gives 
mean range for .V = 1 ( 1 ) 1000 . 

Table XXV ( K . Pearson and B . Stoessi ger) [7 
place] Probability integral for symmetrical 
curves ; yS 2 = 0 ; >S 2 = 1 to 3 and 3 onwards . 

1932. Jack W. Dunlap and Albert K. Kurtz. HANDBOOK 
OF STATISTICAL NOMOGRAPHS, TABLES, AND FORMULAS. 

Part I. Gives nomographs for obtaining prob- 
able errors, or standard errors, or standard de- 
viations of means, sums, differences, frequency 
in a class, proportion in a class, upperor lower 
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quartile, quartile deviation, 10-90 percentile 
range, r, rank correlation coefficient, and also 
nomographs for biserial r, regression weights 
based upon biserial r , correction for attenua- 
tion, effect of range upon standard deviations 
and reliability coefficients, intelligence quo- 
tients, percentages, etc. 

Part II; 

Table 72. Squares, square roots [4 decimal 
places] , reciprocals [5 and 6 significant fig- 
ures], and reciprocals of square roots [6 signi- 
ficant figures] . Argu ment 1 (1) 1000. 

Table 84. "/\~i " [4 places] Argument .000 

(.001) .999. 

Table 86. "1 ~r 2 " [4 places] Argument .000 

(.001) .999. 

Table 88. Reliability of two to ten tests,— 
Spearman- B rown formula. [4 places] Argument 

r ± = .00 (.01) .99. 

Table 95. Functions used in rank correlation. 
d 2 tabled for differences in rank, by .5, from 
.5 to 75. N(N 2 ~1) tabled for N = 1 (1) 100. 

Table also useful in getting N 3 . 

Table 98. Correlation coefficient " From the 
percentage of unlike signed pairs.” 

Part III is an extensive table of formulas as 
u,sed by different authors. 

1933. Leone Chesire, Milton Saffir, and L. L. 
Thurstone. COMPUTING DIAGRAMS FOR THE TETRA- 
CH0RIC CORRELATION COEFFICIENTS. Consisting of 
46 trivariate diagrams, which, when entered by 
means of the proportionate frequencies in two 
marginal totals and one cell frequency, yield 
tetrachoric r. Three decimal place accuracy may 
be expected, 

1933-1935. Harold T. Davis. TABLES OF THE 
HIGHER MATHEMATICAL FUNCTIONS. Vol. I, 1933. 
Vol . II, 1935. There is given throughout a dis- 
cussion of the theoretical bases for and the 
graphs and the properties of the large number of 
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functions dealt with, 

Vol. I , Parts III , IV, and V discuss , give 
tables and bibliography covering a variety of 
interpolation procedures. Five tables of log F 
and one of 1/ r(re^') are given. The argument 
interval is small and the number of decimal 
places varies from 10 to 12 to 15. Six tables 
give or involve the Psi function, which is the 
digamma function as defined by Eleanor Pairman, 
in Tracts for Computers , No. 1, 1919. The argu- 
ment interval is small and the number of decimal 
places varies from 10 to 12 to 15 to 16 to 20. 

Vol . II treats of: (a) The polygamma func- 
tions, giving tables of trigamma, tetragamma, 
pentagamma, and hexagamma functions, employing a 
fairly small interval in the argument and entries 
from 10 up to 20 decimal places. (o) Bernoulli 
polynomials and numbers. (c) Euler polynomials 
and numbers, ( d ) Gram polynomials. (e) func- 
tions of polynomial approximation. 

1934-1938. B I 0HETR i KA PUBLICATIONS. Karl 
Pearson, Ed . University College, London, W. C. 1. 

Tables of the Incomplete Beta- Funct ion. 1934, 
The complete beta- function is defined. 

B{p,q) = J x p-1 (l -x) q ~ 1 dx 

0 

and the incomplete beta- function 

i O x (p,q) =/ x p ~ l (l~xf 1 dx 
o 

The magnitude tabled is 

I x (p,q) = B*(p,q)/3{p,q) 

This trivariate tabling required changes in in- 
terval in p and q, but a uniform interval in x 
of .01 for the entire range x = 0 to x = 1 was 
employed. Even so the tables require 494 pages. 
We note the relationship between I % {p 9 q) and the 
variance ratio. F. . (see [9:21]): i > j, F. . , and 
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P j . refer to Kelley’s notation and p, Q , and x 
to Pearson’s. 

i = 2 q; j=2p;—~r r- = x; P, .= I (p,q) 

It may be stated that P ] . via these Pearson Tables 
is usually one and sometimes two decimal places 
more accurate than by the normalizing transfor- 
mation of Chapter IX, Section 5 herein. 

Tables of the Incomplete Gamma Function . Re- 
issue, 1934. 

I (u,p) =(f e~ v v p dv) / T (p+l) 
o 

is tabled. It is the probability integral of a 
Type III distribution. [7 decimal places in 
Tables I and II and 8 in Table III.] Table I 
arguments p “ .0 (.1) 5.0 (.2) 50.0 and u - .0 
(.]) 16.9. Table II arguments p = ~1.00 (.05) 
.00 and u = . 0 (.1) 48.6. Table III is of log 
i l (u,p), wherein I’(u,p) - I(u,p)/ u p + 1 . Argu- 
ments u = . 0 (.1) 1.5 and P ~ *“1.00 (.05) .0 
(.1) 10.0. Table IV provides "constants of the 
skew curve y = y 0 x p e~* w and Table V gives "five 
figure values of I(u,p)." 

(F. N. David) Tables of the ordinates and 
probability integral of the distribution of the 
correlation coefficient in small samples . 1938. 

Gives the distribution of r for p (Kelley’s r) = . 0 
(.1) .9, and N = 3 (1) 25, 50, 100, 200, 400. 

1934. H. B. Dwight. TABLES OF INTEGRALS AND 
OTHER MATHEMATICAL DATA, 222 pages. Similar to, 
but somewhat more extensive then, Peirce’s Table. 

1935. Truman L. Kelley. ESSENTIAL TRAITS OF 
MENTAL LIFE. Table XXII of Trigonometric func- 
tions used in making rotations of axes. Tabled, 
to 7 decimal places or 7 significant figures, 
are tan 20, cos 0, sin0, cos 2 0, sin 2 0, 2sin0 cos 0, 
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sind cosd, for the argument 0 - 0° 00 ' (1') 1°00' 
(3') 2°0' (6') 3° O' (10') 45°0\ 

1938. R. A. Fisher and F. Yates. STATISTI- 
CAL TABLES FOR BIOLOGICAL, AGRICULTURAL, AND 
MEDICAL RESEARCH. 

Table III [to 3 decimals] Distribution of t 
[Student* s t] . Argument is n, the d. o. f . , for 
n = 1 (1) 30, 40, 60, 120, oo. 

Table IV [4 and 5 figures] Distribution of 
Argument is n, the d. o. f . , for n = 1 (1) 30. 

Table V (C. G. Colcord, L. S. Dewing, R. A. 
Fisher , H. W. Norton and Y . Yates) [4 decimal 
places] Distribution of z. "z may be defined as 
the difference of one-hal f the natural logarithms 
of two different estimates of variance, one 
based on and the other on n 2 degrees of free- 
dom. " Arguments are n JL = 1 (1) 6, 8, 12, 24, 
and n 2 ~ 1 ( 1) 30, 40, 60, 120, CD . Tables for 
percentage points 20, 5, 1, and . 1 are given. 
The second part of Table V tables [generally 3 
figures] the variance ratio instead of z for the 
same arguments. Cf. Merrington and Thompson, 
"Tables of percentage points of the inverted 
beta (F) distribution, " and Thompson, "Tables of 
percentage points of the incomplete beta-func- 
tion," and also Comrie and Hartley, "Table of 
Lagrangian coefficients for harmonic interpola- 
tion in certain tables of percentage points. " 

Table VII [4 and 5 places] Transformation of 
r to z. Here z is defined as in [10:43]. The 
argument is z = .0 (.1) 3, 4. 

Table XV Latin squares. Sets up to 9X0 are 
given . 

Table XX Scores for ordinal ( or ranked) data. 
Herein is given "the average deviate of the rth 
largest of V observations drawn from a normal 
di st ribution having unit variance. " Sample 
sizes up to /V = 40. 

Table XXIII Orthogonal polynomials. ( R. A. 
Fisher and M. F . Yates and V. Satakopan) From 
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n - 3 to n = 51. 

Table XXVI [5 decimal places] "Natural loga- 
rithms." Arguments 1.00 (.01) 10. (1) 999. 

Table XXVII Squares . Arguments 1 (1) 999. 

Table XXVIII [5 figures] Square roots . Argu- 
ments 100 (1) 999, and also 10.0 (.1) 99.9. 

Table XXIX [6 places] Reciprocals . Arguments 
1.00 (.01) 9.99. 

Table XXX [6 figures in the factorial and 7 
decimal places in the logarithm of the factorial] 
Factorials . Arguments 1 (1) 200. 

Table XXXIII [2 figures] Random numbers . Six 
sets of 1250 each of two digit random numbers. 

1939 to 19411. NATIONAL BUREAU OF STANDARDS 
TABLES,- prepared by the FEDERAL WORKS AGENCY, 
Works Project Administration, for the City of 
New York, Arnold N. Lowan, Technical Director. 

Table of Natural Logarithms , [16 decimal 

places] Vol. I Integers from 1 to 50,000. Vol. 
II Integers from 50,000 to 100,000. Vol. Ill 
Decimal numbers from 0 to 5 by in terval s o f 
.0001. Vol. IV Decimal numbers from 5 to 10 by 
intervals of .0001. 1941. 

Tables of the exponential function e*. [12 to 

18 decimals] Argument x ~ -2.5000, (.0001) 2. 500 
(.001) 5.00 (.01) 10.00. 1939. 

Tables of circular and hyperbolic sines and 
cosines . [9 decimal places] Argument x (in ra- 
dians) from 0 to 2 by intervals of .0001. 1939. 

Tables of sines and cosines . [8 decimal 

places] Argument x (in radians) from 0 to 25 by 
intervals of .001. 1940, 

Tables of probability functions . [15 decimal 

places] 

Vol. I, 1941. 

— e"* (= VW Z of Table XV C here- 
in 77 in; 

~f e~ a2 da C =1-2 <7 = p-q 3f Tabl f f c 

yjr J herein; 
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Argument x (- x//2~of Table XV C herein) - .0000 
(.0001) 1.000 (. 001) 5.946. 

Vol . II, 1942. 


1 

i/2tt 





( — Z of Table XV C herein) 


da(- 1-2 q ~ p-q of Table XV C 
he re i n) 


Argument x (- x of Table XV C herein) - .0000 
(.0001) 1.000 (.001) 8.285. 

(Herbert E. Salzer) Table of coefficients for 
inverse interpolation with central differences. 
[10 places] m = (y~y 0 )/(y x ~y 0 )- Argument m = 
.000 (.001) .500. First printed in Journal of 
Mathematics and Physics, Vol. 22, no. 4, Dec., 
1943. 

(Herbert E. Salzer ) Table of coef ficients for 
inverse interpolation with advancing differences. 
[10 places] m as in preceding. First printed in 
Journal of Mathematics and Physics, Vol. 23, no. 
2', May, 1944. 

Tables of Lagrangian Interpolation Coeffi- 
cients, 1944. n is the number of points used in 
a table having equally spaced arguments and p is 
the same as herein [13:93]. The tables give ex- 
act values, or values to 10 decimal places. 


For n - 3 
For n = 4 

For n = 5 
For n = 6 

For o = 7 

For n = 8" 

For n =9 


the argument p - ~1 
the argument p - ~ 1 
(.0001) 1 (.001) 2 
the argument p - - 2 
the argument p = ~ 2 
(.001) 1 (.01) 3 

the argument p - ~ 3 
(.001) 1 (.01) 3 
the argument p = - 3 
(.001) 1 (.1) 4 
the argument p = ~4 


(. 0001 ) 1 
(. 001 ) 0 

(. 001 ) 2 
(. 01 ) 0 


( . 0 1) -1 


(. 1 ) 0 

(.1) 4 
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For n - 10 the argument p - “4 (.1) 5 

For n ~ 11 the argument P ~ ~5 ( . 1) 5 

Certain supplementary tables are given. These 
main tables are more extensive than Kelley's La- 
grangian interpolation coefficient tables, espe- 
cially in that they include 5, 7, 9, and 11 point 
interpolation, in that the interval in p for 
4 point is .0001 instead of .001, for 6 point is 
.001 instead of .01, for 8 point is .001 instead 
of .1, and in that 9 place decimal or exact values 
are given for n = 3 whereas Kelley rounded off 
to 5 decimal places to simplify use in the usual 
situation wherein quadric interpolation will not 
yield accuracy beyond an added 4 or 5 figures 
even though coefficient values are exact. 

1939. William Fleetwood Sheppard. THE PROB- 
ABILITY INTEGRAL. Completed and edited by the 
British Association for the Advancement of Sci- 
ence, Committee for the Calculation of Mathema- 
tical Tables. Cambridge University Press. Table 
I gives (F)x (= Kelley's Q x /z x , — [8 : 02] and 
[8:03] to 12 decimals for argument x = . 00 (.01) 
10,00. Table II gives F(x) to 24 decimals for 
x = .0 (.1) 10.0. fable IV gives L(x) (= -'in q x ) 
to 16 decimals for x = .0 (.1) 10.0. Table V 
gives 'C(x) (= log g x ) to 12 decimals for x = .0 
(.1) 10.0. Table VI gives 'fc(x) to 8 decimals for 
x = ,00 (.01) 10.00. 

1941. L. J. Comrie and H- 0. Hartley. Table 
of L agrangian coefficients for harmonic interpo- 
lation in certain tables of percentage points , 
Biom. , Vol. 32, 1941, pp. 183-186. 

1941. Catherine M. Thompson. Table of per- 
centage of the X 2 distribution, , Vol. 32, 

1941, pp. 187-191. X 2 > to 6 significant figures, 

is given for P - .995, .99, .975, .95, .90, .75, 
.50, .25, . 10, .05, .025, .01, . 005, and for 
d.o.f. - 1 (1) 30 (10) 100. 

1941. Catherine M. Thompson. Tables of ver- 
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cent age points of the incomplete beta function. 
B i oitu , Vol. 32, pp. 168-181. The function I y (o, c) 
is that defined by Pearson in Tables of the in- 
complete beta- function. The percentage points 
are the values of x which, forgiven values of P, 
p, and Q, satisfy the equation I y (p,c) - P. x, 
to 5 significant figures, is given for the fol- 
lowing values of P, of 2 p [~ Kelley's d.o.f, i] 
and of 2 Q [= Kelley's d.o.f. j]: P = .50, .25, 
.10, .05, .025, .01, .005, 2 c = 1 (1) 10, 12, 
15, 20, 24, 30, 40, 60, 120, oo. 2p ~ 1 (1) 30, 
40, 60, 120, oo. Note that Fisher and Yates' Ta- 
ble V (by H. W. Norton) giving percentage points 
for the related function "z n for P = .20 and 
.001, supplement this table. Also cf. Comrie and 
Hartley, Table of L agrangian coefficients for 
harmonic interpolation in certain tables of per- 
centage points . 

1942. Edward Charles Dixon Molina. P 0 I S S 0 N T S 
EXPONENT S A L BINOMIAL LIMIT. Table I, —individual 
terms, and Table II, — cumulated terms. [6 deci- 
mal places] Argument interval varies from .001 
to 1. 

1942. SMITHSONIAN MATHEMATICAL TABLES, Hyper- 
bolic Functions . Fifth reprint, 1942, by George 
F. Becker and C. E. Van Orstand. Derivatives re- 
corded in addition to items noted below. 

Table I. [5 decimal places] Logarithms of 
hyperbolic functions . Argument u = . 0000 (. 0001 ) 
.100 (.001) 3.00, (.01) 6 .00 . 

Table II. [5 decimal places] Natural hyper- 
bolic functions. Same arguments as in Table I. 
As an illustration of the use of this table, note 
that u herein is Fisher' s z from r and tanh is r. 

Table III . [5 decimal places] Natural and 

logarithmic circular functions. Argument u = 
.0000 (.0001) .100 (.001) 1.600. A supplementary 
table using a coarser argument provides the func- 
tions for higher values of u. 

Table IV. Ascending and descending exponen - 
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tial and log 1Q (e u ). For argument u is given 
log'iq( eU ) to 7 decimal o 1 aces ; e u to 7 or more 
significant figures; e" u to 7 decimal places- 
Argument u = ,000 (.001) 3,00 (.01) . , . 15.00. 
A supplementary table using a coarser argument 
provides the functions for higher values of u. 

Table V . [6 places] Natural logarithms . Argu- 

ment u = 0 (1) 1000 and thence by irregular argu- 
ments to 10,000. The same table serviceable to 
obtain e x and for irregular arguments in x. 

Table VI. [7 decimal places] The Guderman- 
nian , Argument u = ,000 (.001) 3.00 (.01) 6.00. 
Table VII gives the n Anti-Gudermannian. " 

1942 - 1943 . Egon S . Pearson and H . 0 . Hartley, 
The probability integral of the range in samples 
of N observations from a normal population . 
Biom. , Vol. 32, 1942, pp. 301-310. The argu- 
ments are N ~ 2 (1) 20, and W, the ratio of a 
specific range to the population standard devia- 
tion, "..00 ( . 05) 7.25. The tabled entry is, to 
4 decimal places, the cumulated frequency to the 
(/V,W) point in question. The same authors in 
Tables of tne probabil ity integral of the s tu- 
denti'zed range, Biom., Vol. 32, 1943, pp. 89- 
99, employ a second sample to secure an inde- 
pendent estimate of the population standard de- 
viation, and thence tables for calculating the 
desired probability integral. 

194-3. M. Merrington and Catherine M. Thompson. 
Tables of percentage point s of the inverted 
beta (F) distribution . Biom,, Vol. 33, 1943, 
pp. 73-88. F is Snedecor's F (Kelley's F . . ) , the 
variance ratio. F is tabled, to 5 significant 
figures, for P = .50, .25, .10, .05, .025, .01, 
.005, for v x (Kelley's d.o.f.i) “ 1 (1) 10, 12, 

15, 24, 30, 40, 60, 120, co, and for v 2 (Kelley's 
d.o.f. j) = 1 (1) 30, 40, 60, 120, oo. Note that 
Fisher and Yates' Table V (by H. W. Norton), 
giving percentage points for P - .20 and .001, 
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supplement this table. Also cf . Comrie and Hart- 
ley, Table of Lagrangian coefficients for har- 
monic interpolation in certain tables of percen- 
tage points . 

im. BESSEL FUNCTIONS. For references to 
the extensive tables of these functions and to 
literature about them see Harry Bateman and Ray- 
mond Claire Archibald, A guide to Tables of 
Bessel Functions , Mathematical Tables and Other 
Aids to Computation, Vol. I, No. 7, July, 1944. 

In press, 1947. The Truman Lee KELLEY STATIS- 
TICAL TABLES. 

Table I. Fight place normal distribution, 
simple correlation, and probabil ity functions . 
The argument p = .5000 (.0001) .9999, or its 
complement, q ~ l~~P, may variously be considered 
the proportionate area in the tail of a normal 
distribution, a probability, or a positive cor- 
relation coefficient, a sine, or a cosine of an 
angle, etc. Tabled are x and z, the deviate and 
ordinate in a unit normal distribution, Vpq , 

V]-p 2 , \/\~q 2 , and also E u and E n 1 , the errors 
in linear and in quadric interpolation, and, for 
p and q, 1 1 and E~ 1 11 , the errors in inverse 

linear and quadric interpolation. A supplemen- 
tary table for p > .9999 is given. Table XV C 
herein is an abridgement and modification of this 
table. 

Table II. Five place [fifth place rounded 
off] three-point interpolation coefficients . 
Argument p (a proportionate distance between two 
tabled arguments in any table having equally 
spaced arguments) - .0000 (.0001) .5000. Ta- 
ble XV A herein is an abridgement of this table. 

Table III. Seven place [seventh place com- 
pensate rially rounded off] four- point interpola- 
tion coefficients. Argument p = . 000 (.001) .500. 
Table XV B herein is an abridgement of this table. 

Table IV. Ten place [tenth place rounded off] 
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six-point interpolation coefficients . Argument 
p = .00 (.01) .50. 

Table V . Eleven place (exact) eight point in- 
terpolation coefficients . Argument p=.0 (.1) 1.0. 

Tab le VI. Four-place \ 2 - functions. Argument 
\ 2 / . o . f . - .0 (.1) 4.1. F , the probability 
that a divergence as great as will arise as 
a matter of chance, is tabled for 1 (1) 10, 12, 
15, 19, 24, 30 d.o. f. Also direct and inverse 

interpolation errors £' 1 , E 1 1 * , E~ 1 1 , €T‘ 1 1 . 

Table VII. Eight place square roots, cube 
roots (for all three place numbers) and natural 
logarithms, for !V = 1.0 (.01) 10.0. Table XV D 
herein is an abridgement of this table. 

Table VIII. Eight place & 1 functions for 
normalizing F. . the variance ratio. Argument 
i (= d.o.f.) - 1 (1) 100 (10) 200 (20) 300 (50) 
500, (100) 1000, co . Table IX E herein is an 
abridgement of this table. 

section 2. LAGRANGIAN INTERPOLATION COEFFICIENTS 

Table XV A, which is a 90 per cent abridgement 
of the three-point Lagrangian interpolation 
coefficients given in The Kelley Statistical 
Tables (in press, 19 47 ) gives, for p ~ .001 
(.001) .500, the c 0 , c x , and c 2 coefficients of 
[13:98a], with such compensatory rounding off 
to five decimal places as to make the rounding off 
error very small. (See Kelley, Tables , in press 
1947) When p <.5 coefficients and tabled entries 
t Q , t ± , and t 2 are multiplied and summed as in 
[13:98a], and when p > 5 then q is the desired 
fraction and computation proceeds in the re- 
verse direction using t ± , t Q , and 

Table XV B, a 90 per cent abridgement of the 
four-point coefficients given in The Kelley 
Statistical Tables , gives, for p = . 00 (.01) .50, 
the c_ x , c Q , c 1 , c 2 coefficients of [13:99a]. 
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TABLE XV A 


THREE-POINT INTERPOLATION COEFFICIENTS 


P c 0 

c i 

C 2 

P 

C G 

C 1 C 2 

- 

+ 

t 



+ + 

.000 .00000 

1.00000 

.00000 

.030 

.01455 

.99910 .01545 

.001 .00050 i 

1.00000 

.00050 

.031 

.01502 

.99904 .01598 

.002 .00100 

1. 00000 

.00100 

.032 

.01549 

. 99898 .01651 

.003 .00150 

1.00000 

.00150 

.033 

.01596 

.99892 .01704 

.004 .00199 

. 99993 

.00201 

.034 

. 0 1 642 

.99884 .01758 

.005 .00249 

. 99998 

.00251 

.035 

.01689 

.9 9 8 78 . 0181 1 

.005 . 00 298 

.99996 

. 00 3 02 

.036 

.01735 

.99870 .01865 

.007 .00348 

.99996 

. 00352 

.037 

. 0 1 782 

.99864 .01918 

.008 .00397 

. 99994- 

.00403 

.038 

.01828 

.9 9856 . 01972 

.009 . 00446 

.99992 

.00454 

.039 

.01874 

.99848 .02026 

.010 .00495 

.99990 

.00505 

.040 

.01920 

. 99840 .02080 

.Oil .00544 

.99988 

.00556 

.041 

.01966 

.99832 .02134 

.012 .00593 

.99986 

.00607 

.042 

.02012 

.99824 .02188 

.013 .00642 

.99984 

.00658 

.043 

.02058 

.99816 .02242 

.014 .00690 

. 99980 

.00710 

.044 

.02103 

.99806 .02297 

.015 .00739 

. 99978 

.00761 

.045 

.02149 

.99798 .02351 

.016 .00787 

.99974 

.00813 

.046 

.02194 

.99788 .02406 

.017 .00836 

. 99972 

.00864 

.047 

. 02240 

.99780 .02450 

.018 .00384 

.99968 

. 00915 

.048 

.02285 

.99770 .02515 

.019 .00932 

.99964 

.00968 

.049 

.02330 

.99760 .02570 

.020 .00980 

.99960 

.01020 

,050 

.02375 

. 99750 . 0 2625 

.021 .01028 

.99956 

.01072 

.051 

.02420 

.99740 .02680 

.022 .01076 

.99952 

.01124 

.052 

.02465 

.9 9730 . 02 735 

.023 .01124 

.99948 

.01 176 

.053 

.02510 

.99720 .02790 

.024 .01171 

. 99942 

.01229 

.054 

.02554 

.99708 .02846 

.025 .01219 

. 99938 

.01281 

.055 

.02599 

.99698 .02901 

.026 .01266 

.99932 

.01334 

.056 

.02643 

.99686 .02957 

.027 .01314 

.99928 

.01386 

.057 

.02688 

.99676 .03012 

.028 .01361 

.99922 

.01439 

.058 

.02732 

.99664 .03068 

.02 9 . 0 1 40 8 

.99916 

.01492 

.059 

.02776 

.99652 .03124 

.030 .01455 

.99910 

.01545 

.060 

.02820 

.99640 .03180 
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THREE-POINT INTERPOLATION COEFFICIENTS 

: o c i c ? \ P c o c i 


.02820 

. 02864 
.02908 
.02952 

.02995 

.03039 

.03082 

.03126 

.03169 

.03212 


1 2 

+ + 

,99640 .03180 

.99628 .03236 
.99616 .03292 
.99604 .03348 

.99590 .03405 
.99578 .03461 
.99564 .03518 

.99552 .03574 
.99538 .03631 
.99524 .03688 


■04095 

.04136 

.04177 

.04218 

.04258 

.04299 

.04339 

.04380 

.04420 

.04460 


.991 16 
,99098 
,99078 


.070 

.03255 

.99510 .03745 

. 100 

.04500 

.99000 


.071 

•03298 

.99496 .03802 

• 101 

.04540 

.98980 


.072 

.03341 

.9 9 4 82 . 03 859 

. 102 

.04580 

. 98960 

. 1 

.073 

.03384 

.99468 . 03916 

. 103 

.04620 

.98940 

J 

.074 

.03426 

.99452 .03974 

. 104 

.04659 

.98918 


.075 

.03469 

.99438 .04031 

. 105 

.04699 

.98898 

. < 

.076 

.03511 

.99422 .04089 

. 106 

. 0 4738 

.98876 

.< 

.077 

.03554 

.99408 .04146 

. 107 

.04778 

.98856 


.078 

.03596 

.99392 .04204 

. 108 

.04817 

.98834 


.079 

.03638 

. 99376 .04262 

. 109 

.04856 

. 98812 

, 1 

.080 

.03680 

.99360 .04320 

. 1 10 

.04895 

.98790 


.081 

.03722 

.9 9 3 44 . 0 4378 

.ill 

.04934 

.98768 


.082 

.03764 

.99328 .04436 

. ! i 2 

.04973 

.98746 


.083 

. 0380 6 

.99312 .04494 

.113 

.05012 

.98724 


.084 

. 03847 

.99294 .04553 

.IU 

.05050 

.98700 


.085 

.03889 

.99278 .0461 1 

. 115 

. 05089 

.98678 


.086 

.03930 

.99260 .04670 

. 1 16 

.05127 

.98654 


.087 

.03972 

.99244 .04728 

.117 

.05166 

. 98632 


.088 

.04013 

.99226 .04787 

. 1 18 

.05204 

. 98608 


.089 

.04054 

.99208 .04846 

.119 

.05242 

.98584 


• 090 

, 04*095 

.99190 .04905 

. 120 

.05280 

.98560 
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THREE-POINT INTERPOLATION COEFFICIENTS 


p 

c„ 

■j 

C 1 C 2 

+ + 

P 

C.-, 

u 

c, 

+ 

C 2 

+ 

. 120 

.05280 

.98560 . 06720 

. 150 

.06375 

. 97750 

.08625 

. 121 

.05318 

. 98536 . 06782 

.151 

.06410 

. 97720 

.08690 

. 122 

.05356 

.98512 . 06844 

. 152 

.06445 

.97690 

. 08755 

. 123 

.05394 

.98488 .06906 

. 153 

. 06480 

.97660 

.08820 

.121 

.05431 

.98462 .06969 

. 154 

.06514 

.97628 

. 08886 

. 125 

.05469 

.98438 .07031 

. 155 

.06549 

.97598 

.08951 

. 126 

. 05506 

.98412 .07094 

. 156 

.06583 

.97566 


.127 

.05544 

.98388 .07156 

. 157 

.06618 

.97536 


.128 

.05581 

.98362 .07219 

.158 

.06652 

.97504 

.09148 

.129 

.05618 

.98336 .07282 

. 159 

.06686 

. 97472 

. 09214 

.130 

.05655 

.98310 .07345 

.160 

.06720 

.97440 

. 09280 

. 131 

.05692 

.98284 .07408 

.161 

.06754 

.97408 

.09345 

. 132 

.05729 

.98258 .07471 

. 162 

.06788 

.97376 

.09412 

.133 

. 05766 

.98232 .07534 

. 163 

.06822 

.97344 

. 09478 

.134 

,.05802 

.98204 .07598 

.164 

.06855 


. 09 5 45 

. 135 

.05839 

.98178 .07661 

.165 

.06889 

. 97278 

.0961 1 

. 136 

.05875 

.98150 .07725 

. 166 

. 06922 

. 97244 

. 09678 

. 137 

.05912 

.98124 .07788 

. 167 

.06956 

.97212 

.09744 

. 138 

.05948 

.98096 .07852 

. 168 

.06989 

.97178 

. 098 ! 1 

. 139 

.05984 

.98068 .0791 6 

. 169 

.07022 

.97144 

.09898 

. 140 

.06020 

.98040 .07980 

. 170 

.07055 

.971 10 

.09945 

. m 

.06056 

.98012 .08044 

. 171 

.07088 


. 10012 

. 142 

.06092 

.97984 .08108 

. 172 

.07121 

. 97042 

. 10079 

. 1 4*3 

.06128 

.97956 .08172 

.173 

.07154 


.10146 

. 144 

.06163 

.97926 .08237 

. 174 

.07186 

.96972 

.10214 

. 145 

.06199 

.978 98 . 08301 

. 175 

.07219 

.96938 

. 10281 

. 146 

.06234 

.97868 .08366 

. 176 

.07251 

.96902 

. 1 0349 

. 147 

.06270 

.97840 .08430 

.177 

. 07284 

.96868 

.10416 

.148 

.06305 

. 97810 .08495 

.178 

.07316 

.96832 

.10484 

. 149 

.06340 

'.97780 .08560 

, 179 

.07348 

.96796 

. 1 0552 

.150 

.06375 

.97750 .08625 i 

.180 

.07380 

. 96760 

.10620 
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THREE-POINT INTERPOLATION COEFFICIENTS 


p 

c o 

c i 

+ 

C 2 

4* 

P 

c o 

C 1 C 2 

+ + 

.180 

.07380 

. 96760 

. 1 0620 

.210 

.08295 

.95590 . 12705 

. 181 

.07412 

. 96724 

. 10588 

.21 I 

.08324 

.95548 . 12776 

. 182 

.07444 

.96688 

.10756 

.212 

.08353 

.95506 .12847 

. 183 

. 07476 

. 96652 

.10824 

.213 

.08382 

.95464 .12918 

. 184 

.07507 

.96614 

.10893" 

.'214 

.08410 

.95420 .12990 

•. 185 

. 07539 

.96578 

. 1096 I 

.215 

. 08 4 39 

.95378 . 13061 

. 186 

.07570 

.96540 

. 1 1030 

.216 

. 08 46 7 

.95334 .13133 

. 187 

.07602 

.96504 

. 1 1098 

.217 

.08496 

.95292 . 13204 

. 188 

.07633 

.96466 

. 1 1 1 67 

.218 

.08524 

.95248 . 13276 

■ 189 

.07664 

.96428 

.11 236 

.219 

. 08552 

.95204 .13348 

. 190 

.07695 

.96390 

.1 1305 

.220 

.08580 

.95160 . 13420 

. 191 

.07726 

.96352 

. 11374 

.221 

.08608 

.95116 . 13492 

. 192 

.07757 

.96314 

. 1 1443 

.222 

.08636 

.95072 . 13564 

. 193 

.07 788 

.96276 

. 1 1512 

.223 

. 08664 

.95028 . 13636 

.194 

.07818 

.96236 

.11582 

.224 

.08691 

.94982 . 13709 

.195 

.07849 

.96198 

. 1 1651 

.225 

.08719 

.949 3 8 . 13 7 81 

.196 

.07879 

.96158 

. 1 1721 

. 226 

. 08746 

.94892 . 13854 

.197 

.07910 

.96120 

. 1 1790 

. 227 

. 08774 

.94848 . 13926 

.198 

.07940 

.96080 

. 1 1860 

. 228 

.08801 

.94802 . 13999 

. 199 

. 07970 

.96040 

. 11930 

. 229 

. 08828 

.94756 .14072 

.200 

.08000 

.96000 

. 12000 

.230 

.088 5 5 

.94710 . 14145 

.201 

.08030 

. 95960 

. 12070 

.231 

.08812 

.94664 . 14218 

.202 

. 080 60 

.95920 

. 12140 

.232 

.08909 

.94618 .14291 

.203 

.08090 

.95880 

.12210 

.233 

.08936 

.94572 . 14364 

.204 

.08119 

.95838 

. 1 228 1 

.234 

.08962 

.9452 4 . 14438 

.205 

.08149 

.95798 

. 12351 

.235 

. 08989 

.94478 .14511 

.206 

.08178 

.95756 

.12422 

.236 

.09015 

.94430 .14535 

h" 

O 

CM 

. 08208 

.95716 

.12492 

.237 

.09042 

.94384 .14658 

.208 

.08237 

. 95674 

. 12563 

.238 

. 09068 

.94336 . 14732 

.209 

.08266 

.95632 

. 12634 

.239 

. 09 0 94 

.94288 .14806 

.210 

. 08295 

.95590 

. 12705 

.240 

. 09 1 20 

.94240 . 14880 
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TABLE XV A 

THREE-POINT INTERPOLATION COEFFICIENTS 


p 

c o 

C 1 c 2 

+ + 

P 

c o 

c .i 

+ 

C 2 

4 

. 24-0 

.09 120 

. 94240 . 14880 

.270 

■ 09855 

. 92710 

.17145 

. 241 

.09146 

.94192 .14954 

.271 

.098 78 

.92656 

. 1 7222 

. 242 

.09172 

.94144 . 15028 

.272 

.09901 

.92602 

. 17299 

.243 

.09198 

.94096 . 15102 

.273 

.09924 

.92548 

. 17376 

.244 

.09223 

.94046 .15177 

.274 

. 09946 

. 92492 

.17454 

.245 

.09249 

.93998 .15251 

.275 

. 09969 

.92438 

. 1753 i 

.246 

.09274 

. 93948 . 15326 

.276 

.09991 

.92382 

. 17609 

.247 

.09300 

.93900 . 15400 

.277 

. 10014 

. 92328 

. 17686 

.248 

.09325 

.93850 . 15475 

.278 

. 10036 

.92272 

. 17764 

.249 

.09350 

. 93800 . 15550 

.279 

. 10058 

.92216 

. 17842 

.250 

.09375 

. 93750- . 15625 

.280 

. 10080 

.92160 

. 179 20 

.251 

. 091+00 

. 93700 .15700 

.281 

. 10102 

.92104 

. 17998 

.252 

.09425 

.93650 . 15775 

.282 

. 10124 

.92048 

. 18076 

.253 

. 09450 

.93600 . 15850 

.283 

. 10146 

.91992 

.13154 

.254 

.09474 

.93548 . 15926 

.284 

. 10167 

.91934 

.18233 

. 255 

.09499 

.93498 .16001 

.285 

. 10189 

.91878 

. 1831 1 

.256 

.09523 

.93446 . 16077 

.286 

. 10210 

.91820 

. 18390 

.257 

.09548 

.93396 .16152 

.287 

. 10232 

.91764 

. 18468 

.258 

.09572 

.93344 . 16228 

.288 

. 10253 

.91706 

. 18547 

.259 

. 09596 

.93292 .16304 

.289 

. 1 0274 

.91648 

. 18626 

.260 

. 09620 

.93240 . 16380 

.290 

. 10295 

.91590 

. 18705 

.26 1 

. 09644 

.93188 .1 6456" 

.291 

. 10315 

.91532 

. 18784 

,262 

.09668 

.93136 . 16532 

.292 

. 10337 

.91474 

. 18863 

.263 

.09692 

.93084 .16608 

.293 

. 10358 

.91416 

. 18942 

.264 

. 09715 

.93030 . 16685 

.294 

. 10378 

.91356 

. 19022 

.265 

.09739 

.92978 .16761 

.295 

. 10399 

.91298 

. 19101 

.266 

.09762 

.92924 .16833 

.296 

. 10419 

.91238 

. 19181 

.267 

.09786 

.92872 .16914 

.297 

. 10440 

.91 180 

. 19260 

.268 

.09809 

.92818 .16991 

.298 

. 10460 

.91120 

. 19340 

.269 

.09832 

.92764 . 17068 

.299 

. 10480 

.91060 

. 1 9420 

.270 

.09855 

.92710 .17145 

.300 

. 10500 

.■91000 

. 19500 
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TABLE XV A 

THREE-POINT INTERPOLATION COEFFICIENTS 


p 

c o 

c i 

C 2 

1 P 


C G 

C 1 C 2 



+ 

+ 



— 

+ + 

.300 

. 10500 

.91000 

. 1 9500 

.330 


1 1055 

.89110 .21945 

.301 

. 10520 

. 90940 

. I 9580 

.331 


1 1072 

.89044 .22028 

.302 

. 10540 

.90880 

. 19660 

. 332 

a 

11089 

.88978 .221 II 

.303 

.10560 

.90820 

. 19740 

.333 

a 

1 1106 

.88912 .22194 

.304 

. 10579 

.90758 

. S982I 

.334 

a 

1 1122 

.88844 .22278 

.305 

. 10599 

. 90698 

.19901 

.335 

a 

1 1 139 

.88778 .22361 

.306 

. 10618 

.90636 

. 1 9982 

.336 


1 1155 

.88710 .22445 

.307 

. 10638 

.90576 

. 20062 

.337 


11172 

.88644 .22528 

.308 

.10657 

.90514 

.20143 

.338 


I 1 188 

.88576 .22612 

.309 

.10676 

.90452 

. 20224 

.339 


11204 

.88508 .22595 




.20305 

.340 


11220 

.88440 .22780 

.311 

. 10714 

.90328 

.20386 

.341 


11236 

.88372 .22864 

.3 12 

. 10733 

« 90266 

.20467 

.342 

a 

11252 

.88304 .22948 

-313 

. 10752 

.90204 

.20548 

.343 

a 

11268 

.88236 .23032 

.314 

. 10770 

.90140 

.20630 

.344 

a 

11283 

.88166 .23117 

.315 

.10789 

.90078 

.20711 

.345 


11299 

.88098 .23201 

.316 

. 10807 


.20793 

.346 


1 1314 

.88028 .23286 

.317 

. 10825 

.89952 

. 20874 

.347 


1 1330 

.87960 .23370 

.318 

. 1 0844 

. 89888 

.20956 

.348 


11345 

.87890 .23455 

.319 

. 10862 

.8982 4 

.21038 

.349 


1 1360 

.87820 .23540 

.320 



.21120 


B 


.87750 .23625 

.321 


.89696 

.21202 

.351 



.87680 .23710 

.322 

.10916 

.89 632 

.21284 

.352 


i 1405 

.87610 .23795 

.323 

. 10934 

.89568 

.21366 

.353 



.87540 .23880 

. 324 

. 10951 


.21 449 

.354 


1 143 4 

.87468 .23966 

.325 

. 10969 

.8 9 438 

.21531 

.355 


1 1449 

.87398 .24051 

.326 

..1098 6 

.89372 

.21614 

.356 


1 1463 

.87326 .24137 

.327 


.89308 

.21696 

.357 


1 1478 

.87256 .24222 

.328 

. 1 1021 

.89242 

.21779 

.358 


1 1492 

.87184 .24308 

.329 

.11038 

.89176 

.21862 

.359 


11506 

.87112 .24394 

.330 

. 11055 

.891 10 

.21945 

.360 


11520 

.870 40 . 2 4480 
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TABLE XV A 

THREE-POINT INTERPOLATION COEFFICIENTS 


p 

c o 

c i 

+ 

C 2 
+ 

P 

c o 

C 1 C 2 

+ + 

.360 

. 1 1520 

. 87040 

.24480"’ 

.390 

.11895 

.84790 .27105 

.361 

.11534 

.86968 

.24566 

.391 

.11906 

.84712 .27194 

.362 

. 11548 

.86896 

.24652 

.392 

. 119 17 

.84634 .27283 

.363 

.11562 

.36824 

. 24738 

.393 

.11928 

.84556 .27372 

.364 

. 11575 

.86750 

.24825 

.394 

. 1 1 938 

.84476 .27462 

.365 

. 1 1589 

.86678 

.24911 

.395 

.11949 

.84398 .27551 

.366 

. 1 1602 

.86604 

. 249 98 

.396 

. 1 1959 

.84318 .27641 

.367 

.11616 

.86532 

.25084 

.397 

. 1 1970 

.84240 .27730 

.368 

. 11629 

. 86458 

.25171 

.398 

. 11980 

.84150 .27820 

.369 

.11642 

.86384 

.25258 

.399 

. 11990 

.84080 .27910 

.370 

.11555 

.86310 

.25345 

,400 

. 12000 

.84000 .28000 

.371 

.11668 

.86236 

.25432 


wm 

.83920 .28090 

.372 

.1168 1 

.85162 

.25519 

Hlli 

n 

.83840 .28180 

.373 

.11694 

.86088 

.25606 

g 

. 12030 

.83760 .28270 

.374 

.11706 

.86012 

.25694 

.404 

. 12039 

.83678 .28361 

.375 

. 1 1719 

.85938 

.25 78 1 

.405 

. 12049 

.83598 .28451 

.376 

. 1 173 1 

.85862 

.25869 

. 406 

. 12058 

.83516 .28542 

.377 

. 11744 

.85788 

.25956 

.407 

. 12068 

.83436 .28632 

.378 

. 11756 

.85712 

.26044 

.408 

. 12077 

.83354 .28723 

.379 

. 1 1 768 

.85636 

. 26 1 32 

.409 

. 12086 

.83272 .28814 

.380 

. 11780 

.85560 

.26220 

. <+ io 

. 12095 

. 83190 .28905 

.381 

. 11792 

.85484 

.26308 

.411 

. 12104 

.83108 .28996 

.382 

. 1 1804 

.85408 

.26396 

.412 

. 121 13 

.83026 .29087 

.383 

. 1 1 81 6 

.85332 

.26484 

.413 

. 12122 

.82944 .29178 

.384 

.11827 

.85254 

.26573 

. 414 

. 12130 

.82860 .29270 

.385 

. 11839 

.85178 

.26661 

. 415 

. 12139 

.82778 .29361 

.386 

. 11850 

.85100 

. 26750 

.416 

. 12147 

.82694 .29453 

.387 

. 11862 

.85024 

. 268 38 

.417 

. 12156 

.82612 .29544 

.388 

. 1 1873 

.84946 

.26927 

. 418 

. 12164 

.82528 .29636 

.389 

. 1 1884 

.8 4868 

.27015 

.419 

. 12172 

.82444 .29728 

.390 

. 11895 

. 84790 

.27105 

.420 

.1 2180 

.82360 .29820 
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TABLE XV A 

THREE-POINT INTERPOLATION COEFFICIENTS 


p 

c o 

c i 

+ 

C 2 

+ 

P 

c o 

Ci c 2 

+ + 

.420 

. 12180 

. 82360 

.29820 

. 450 

. 12375 

.79750 .32625 

. 42 1 

. 12188 

.82276 

.29912 

.451 

. 12380 

.79660 .32720 

.422 

. 12196 

.82192 

. 30004 

.452 

. 12385 

.79570 .32815 

.423 

. 12204 

.82108 

.30096 

• 453 

. 12390 

.79480 .329 10 

.424 

. 12211 

.82022 

.30189 

• 454 

. 1239 4 

.79388 . 33006 

.425 

. 12219 

.81938 

.30281 

.455 

. 12399 

.79298 .33101 

.426 

. 12226 

.81852 

.30374 

. 456 

. 12403 

.79206 .33197 

.427 

. 12234 

.81768 

.30466 

. 457 

. 12408 

.79116 .33292 

.428 

. 12241 

.81682 

. 30559 

.458 

. 12412 

.79024 .33388 

.429 

. 12248 

.81596 

.30652 

.459 

. 12416 

.78932 .33484 

. 430 

. 12255 

.8 1510 

. 30745 

.460 

. 12420 

.78840 . 33580 

.43 1 

. 12262 

.81424 

. 30838 

.461 

. 12424 

. 78748 .33676 

.432 

. 12269 

.81338 

.30931 

.462 

. 12428 

.78656 .33772 

.433 

. 12276 

.81252 

.31024 

.463 

. 12432 

.78564 .33868 

. 434 

. 12282 

.81 164 

.31118 

.464 

. 12435 

.78470 .33965 

.435 

. 12289 

.81078 

.31211 

. 465 

. 12439 

.73378 .34061 

. 436 

. 12295 

.80990 

.31305 

.466 

. 12442 

.78284 .34158 

. 437 

. 12302 

.80904 

.31398 

.467 

. 12446 

.78192 .34254 

.438 

. 12308 

.80813 

.31492 

.468 

. 12449 

.78098 .34351 

. 439 

. 12314 

.30728 

.31536 

.469 

. 12452 

.78 004 . 34 448 

. 440 

. 12320 

.30640 

.31680 

.470 

.12455 

.77910 . 34545 

.441 

. 12326 

.80552 

.31774 

.471 

. 12453 

.77816 . 34642 

. 442 

. 12332 

.80464 

.31868 

.472 

. 12461 

.77722 .34739 

. 443 

. 12338 

.80376 

.31962 

.473 

. 12464 

.77628 .34836 

. 444 

. 12343 

.80286 

.32057 

.474 

. 12466 

.77532 .34934 

.445 

. 12349 

.80193 

.3215 1 

.475 

. 12469 

.77438 .35031 

.446 

. 12354 

.80108 

.32246 

.476 

. 12471 

.77342 .35129 

.447 

. 12360 

. 80020 

.32340 

.477 

. 12474 

.77248 .35226 

.448 

. 12365 

.79930 

.32435 

.478 

. 12476 

.77152 .35324 

.449 

. 12370 

.79840 

. 32530 

.479 

. 12478 

.77056 .35422 

.450 

. 12375 

. 79750 

. 32625 

.480 

. 12480 

.76960 .35520 
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TABLE XV A 

THREE-POINT INTERPOLATION COEFFICIENTS 


p 

c o 

+ 

C 2 

+ 

p 

c o 

C 1 C 2 

+ + 

. m 

. 1 2480 

.76960 

.35520 

.490 

. 12495 

.75990 .36505 

.481 

. 12432 

.76864 

.35618 

.491 

.12496 

.75892 .36604 

.482 

. 12484 

.76768 

.35716 

..492 

. 12 49 7 

.75794 .36703 

CO 

00 

-± 

« 

. 1248 6 

.76672 

.35814 

. 493 

. 12498 

.75696 .36802 

. 484 

. 12487 

.76574 

.35913 

.494 

. 12498 

.75596 .36902 

. 485 

. 12489 

.75478 

.36011 

. 496 

.12499 

.75498 .37001 

. 486 

.12490 

.76380 

.361 10 

.496 

.12499 

.75398 .37101 

.487 

. 12492 

. 76284 

.36208 

.497 

. 12500 

.75300 .37200 

. 488 

. 12493 

.76186 

.36307 

.498 

. 12500 

.75 200 . 373 00 

.489 

. 12494 

.76088 

.36406 

. 499 

. 12500 

.75100 .37400 

.490 

. 12495 

.75990 

.36505 

.500 

. 12500 

.75000 .37500 
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TABLE XV B 


FOUR-POINT INTERPOLATION COEFFICIENTS 


P c_ 1 


c c 


C ] 


c. 


P 

(p<.5) 


+ 


+ 


- 


(p>.5) 

.00 .00000 00 

1.00000 

00 

.00000 

00 

. 00000 

00 

1.00 

.01 .00328 

35 

.99490 

05 

.01004 95 

.00166 

65 

.99 

.02 . 00646 80 

.98960 

40 

. 02019 

60 

.00333 

20 

.98 

. 03 .00955 

45 

.98411 

35 

. 03043 

65 

. 00499 

55 

.97 

.04 .01254 

40 

.97843 

20 

. 04076 

80 

.00665 

60 

.96 

.05 . 01543 

75 

.97256 

25 

.05118 

75 

.00831 

25 

.95 

.06 .01823 

60 

.96650 80 

. 06 1 69 

20 

. 00996 

40 

.94 

.07 .02094 05 

.96027 

15 

.07227 

85 

.01160 

95 

.93 

.08 .02355 

20 

.95385 

60 

. 0829 4 40 

.01324 

80 

.92 

. 09 .02607 

15 

.94726 

45 

.09368 

55 

.01487 

85 

.91 

.10 . 02850 

00 

.94050 

00 

.10450 

00 

.01650 00 

. 9u 

.11 . 03083 

85 

.93356 

55 

.11538 

45 

.0181 1 

15 

.89 

.12 . 03308 

80 

.92646 

40 

. 12633 

60 

.01971 

20 

.88 

.13 .03524 95 

.91919 

85 

. 13735 

IS 

.02130 

05 

.87 

.14 .03732 

40 

.91177 20 

. 14842 

80 

. 02287 

60 

.86 

.15 .03931 

25 

.90418 

75 

. 15956 

25 

. 02443 

75 

.85 

.16 .04121 

60 

.89644 80 

. 17075 

20 

. 02598 

40 

.84 

.17 .04-303 

55 

.88855 

65 

. 18 199 

35 

.02751 

45 

.83 

. 18 .04477 20 

. 8805 1 

60 

. 19328 

40 

. 02902 

80 

.82 

. 19 .04642 

65 

.87232 

95 

.20462 

05 

.03052 35 

.81 

.20 .04800 

00 

.86400 

00 


00 


.80 

. 21 .049 49 

35 

.85553 

05 

.22741 

95 

. 03345 

65 

t 79 

.22 . 0509 0 80 

. 84692 

40 

.23887 

60 

.03489 

20 

.78 

. 23 .05224 45 

.83818 

35 

. 25036 

55 

.03630 

55 

.77 

.24 .05350 40 

.8293 1 

20 

.26188 

80 

.03769 

60 

.76 

.25 .05468 

75 

.8203 1 

25 

. 27343 

75 

.03906 

25 

.75 

.26 .05579 

80 

.81118 

80 

.28501 

20 

.04040 40 

.74 

.27 .05683 

05 

. 80194 i 5 

. 29660 

85 

.04171 

95 

.73 

.28 .05779 20 

.79257 

60 

.30822 

40 

.04300 

80 

.72 

.29 .05868 

15 

.78309 

46 

.31985 

55 

.04 426 

85 

.71 

.30 .05950 

00 

.77350 

00 

.33 150 

00 

.04550 

00 

.70 
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TABLE XV B 

FOUR-POINT INTERPOLATION COEFFICIENTS 


p 

c , 

— -L 

c o 

c i 


C 2 


p 

(p <.5) 

f 

+ 

+ 


+ 


(p>.5 

.30 

.05950 00 

.773 5 0 00 

.33150 

00 

.04550 

00 

.70 

-31 

.0502 4 85 

. 76379 55 

.34315 

45 

.04670 

15 

.69 

.32 

.06092 80 

.75398 40 

- 3548 1 

60 

. 04787 

20 

.68 

.33 

.06153 95 

.74406 85 

-36643 

15 

.04901 

05 

.67 

.34 

.06208 40 

.73405 20 

i .37814 

80 

.05011 

60 

-66 

.35 

.06256 25 

.72393 75 

- 389 81 

25 

.051 18 

75 

.65 

-36 

.06297 60 

.71372 80 

j . 40147 20 

.05222 

40 

.64 

.37 

. 06332 55 

.70342 65 

.41312 

35 

.05322 

45 

.63 

.38 

.06361 20 

.69303 60 

.42476 

40 

. 05418 

80 

.62 

.39 

.06383 65 

.68255 95 

. 43639 

05 

. 055 1 1 

35 

.61 

.40 

.06400 00 

.67200 00 

.44800 00 

. 05600 

00 

.60 

.41 

. 06410 35 

.66136 05 

.45958 

95 

.05684 

65 

.59 

. 42 

.0641 4 80 

.65064 40 

.47115 

60 

.05765 

20 

.58 

.43 

. 06413 45 

• 6398 5 35 

.43269 

65 

.05841 

55 

.57 

.44 

. 06406 40 

.62899 20 

.49420 80 

.05913 

60 

.56 

.45 

.06393 75 

.61806 25 

. 50558 

75 

.05981 

25 

.55 

.46 

.0S375 60 

.60706 80 

.51713 

20 

.06044 

40 

.54 

. 47 

.06352 05 

.59601 15 

.52853 

85 

.06102 

95 

.53 

- 48 

. 06323 20 

.58489 60 

.53990 40 

.06156 80 

.52 

.49 

.06289 15 

.57372 45 

j .55122 

55 

.06205 

85 

.51 

.50 

.06250 00 

.56250 00 

.56250 00 

.06250 

00 

.50 
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section 3. NORMAL PROBABILITY FUNCTIONS 

The arguments p and q of Table XV C are the 
areas of the larger and the smaller tails of a 
unit normal distribution dichotomized at the 
point x, which is a standard score deviate. I 
is the area from the mean to this point x, and z 
is the ordinate at the point x. 

The ? ! 1 footings at the bottoms of the col- 
umns of this table give the maximum linear in- 
terpolation errors for I, p, or q arguments half 
way down the columns. On the last two pages of 
this table E' 1 ‘ , the maximum three point inter- 
polation error when I, p, or q is the argument, 
is given. The ! 1 footings are the maximum in- 
verse linear interpolation error in I, p , or q 
when x is the argument. 



640 


STATISTICAL TABLES 


TABLE XV C 



.000000 

.398942 

.002507 

.005013 

.007520 

■398941 

•398937 

•398931 

.010027 

.012533 

.015040 

.398922 

.398911 

.398897 

.017547 

.398881 

.020054 

.398862 

.022562 

•398841 


.012 

.O3OO84 

.398762 

.013 

.032592 

•398730 

.014 

.035100 

.398697 

.015 

.037608 

.398660 

.016 

.O4OII7 

.398621 

.017 

.042626 

.398580 

.018 

.045135 

•398536 

.019 

.O47644 

.398490 


.050154 

.39»44I 

.052664 

.055174 

.057684 

•398389 

•398336 

.398279 


.062707 .398159 


.070243 .397959 


.O8O298 .397658 

.082813 .397577 


2/2 Z lP Pi 

.79788 79788 .250000 


.79948 

.80108 

.80268 

.80428 

.80588 

.80748 


79I5I 

78992 

78833 


75527 

75371 

75215 


249999 

249996 

249991 

249984 

249975 

249964 

249951 

249936 

249919 


249879 

249856 

249831 

249804 

249775 

249744 

2497 1 1 
249676 

299639 

249600 

249559 

249516 

24947I 

249424 

249375 

249324 

24927I 

249216 

249159 


249O39 

248976 

2489II 

248844 

248775 

248704 

248631 

248556 

248479 
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TABLE XV C (CONTINUED) 



X 

z 


.102953 

.105474 

.107995 

•396834 

■396729 

.396623 

.110516 

.113039 

.115562 

•396513 

.396401 

.396287 

.118085 

.120610 

.123135 

.396170 

.396051 

.395929 

.125661 

■395805 

.128188 

.130716 

.133245 

.395678 

•395549 

•395417 

.135774 

.138304 

.140835 

.395282 

• 395 H 5 

.395005 

•143367 

.145900 

.148434 

.394863 

.394719 

.394572 



.153505 

.156042 

.158580 

.394270 

.394115 

•393957 

.161119 

.163658 

.166199 

.393798 

■393635 

.393470 

.168741 

.171285 

.173829 

.393303 

•393133 

.392960 

•176374 

•392785 

.178921 

.181468 

.184017 

.392608 

•392427 

•392245 

.186567 

.189118 

.191671 

•392059 

.391870 

.391681 

.194225 

.196780 

•199336 

.391488 

• 39 I2 93 

.391095 

.201893 

.390894 


.2484OO 

.248319 

.248236 

.248151 

.248064 

•247975 

.247884 

. 24779 I 

.247696 

.247599 


•247399 

.247296 

. 24719 I 

.247084 

246975 

,246864 

,246751 

.246636 

.246519 


.2464OO 

246279 

246156 

24603 I 

2459 O 4 

245775 

245644 

245511 

245376 

245239 


•244959 

.244816 

.244671 

.244524 

•244375 

.244224 

. 244 O 7 I 

.243916 

•243759 

.2436.OO 



OOOOO + 


000000+ 
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TABLE XV C (continued) 



X 

z 

.201893 

.390894 . 

.204452 

.207013 

.209574 

.390691 

•390485 

.390277 

.212137 

.214702 

.217267 

.390066 

•389852 

.389636 

•219835 

.222403 

.224973 

.389418 

.389197 

•388973 

■227545 

.388747 

.230118 

.232693 

.235269 

.388518 

.388287 

•388053 

.237847 

.240426 

.243007 

.387816 

•387577 

•387335 

.245590 

.248174 

.250760 

.387091 

.386844 

•386595 

253347 

.386342 

•255936 

•258527 

.261120 

.386088 

•385831 

•385571 

.263714 

.266311 

.268909 

.385308 

•385043 

•384776 

.271508 

.274110 

.276714 

.384506 

.384233 

•383957 

.279319 

•383679 

.281926 

.284536 

.287147 

.289760 

.292375 

.294992 

.297611 

.300232 

.302855 

•383399 

•383115 

.382830 

•382541 

.382250 

•381956 

.381660 

.381361 

.381060 

.305481 

•3S0755 




2/2 

z/p 

Pi 

P 

.93070 

•67396 

.243600 

.580 

.93244 

.67245 

.243439 

.581 

.93417 

•67094 

.243276 

.582 

•93592 

•66943 

.243111 

•583 

.93766 

.66792 

.242944 

•584 

.93940 

.66641 

.242775 

•585 

.94115 

.66491 

.242604 

.586 

.94290 

166340 

•242431 

•587 

•94465 

.66190 

.242256 

.588 

.94641 

66040 

.242079 

•589 

.94816 

.65889 

.241900 

•590 

.94992 

•65739 

.241719 

.591 

.95168 

■ -65589 

.241536 

.592 

.95345 

•65439 

.241351 

•593 

.95521 

•65289 

.241164 

•594 

.95698 

•65139 

.240975 

•595 

.95875 

.64989 

.240784 

•59^ 

96052 

.64839 

.240591 

•597 

.96230 

.64690 

.240396 

.598 

.96408 

•64540 

.240199 

•599 

.96586 

•64390 

.240000 

.600 


.97839 
.98019 I 
.98199 | 


.99650 

•99833 
1. 00016 


•239799 

•239596 

•239391 

.239184 

•238975 

.238764 

•238551 

•238336 

.238119 

.237900 

.237679 

•237456 

.237231 

.237004 

•236775 

•236544 

.236311 

.236076 

.235839 

.235600 



















NORMAL FUNCTIONS 643 


TABLE XV C (CONTINUED) 


1 

X 

z 

2 

z/q 

z/p 

Pa 

£ 

120 

.305481 

•380755 

.380 

1. 00 1 99 

.61412 

.235600 

.620 

.121 

.308108 

•380449 

.379 

1.00382 

.61264 

.235359 

.621 

.122 

.310738 

.380139 

•378 

1.00566 

.61116 

.235116 

.622 

• 123 

•313369 

■379827 

•377 

1.00750 

.60967 

•234871 

.623 

.124 

.316003 

•379513 

•376 

1.00934 

.60819 

.234624 

.624 

•125 

■318639 

.379195 

•375 

1.01119 

.60671 

.234375 

.625 

.126 

.321278 

•378875 

•374 

1.01303 

.60523 

.234124 

.626 

.127 

.323918 

•378553 

•373 

1.01489 

•60375 

.233871 

.627 

.128 

.326561 

.378227 

•372 

1.01674 

.60227 

.233616 

.628 

.129 

.329206 

.377900 

• 37 i 

1.01860 

.60079 

.233359 

.629 

.130 

•331853 

•377569 

•370 

1.02046 

.59932 

.233100 

.630 

.131 

•334503 

•377236 

•369 

1.02232 

•59784 

•232839 

.631 ' 

.132 

.337155 

.376900 

•368 

1.02418 

■59636 

.232576 

.632 

•133 

■339809 

.376562 

■367 

1.02605 

•59488 

.232311 

•633 

-134 

.342466 

.376220 

.366 

1.02792 

•59341 

.232044 

•634 

• 135 

.345125 

•375877 

•365 

1.02980 

•59193 

.231775 

•635 

.136 

•347787 

.375530 

•364 

1.03168 

.59046 

.231504 

.636 

•137 

•350451 

■375181 

•363 

1.03356 

.58898 

.231231 

•637 

.138 

•353118 

.374829 

.362 

1.03544 

•58751 

.230956 

.638 

•139 

■355787 

•374475 

.361 

1.03733 

.58603 

.230679 

•639 

.140 

•358459 

•374118 

.360 

1.03922 

•58456 

.230400 

.640 

RSI 

•361133 

.373758 

•359 

1.04111 

•58309 

.230119 

.641 

El 

.3^3810 

.373395 

.358 

1.04300 

.58161 

.229836 

.642 

Bta 

.366489 

.373030 

•357 

1.04490 

.58014 

.229551 

■643 

.144 

.369171 

.372662 

.356 

1.04680 

•57867 

.229264 

•644 

•145 

•371856 

.372292 

•355 

1.04871 

.57720 

.228975 

•645 

.146 

•374544 

.371919. 

•354 

1.05062 

•57573 

.228684 

.646 

.147 

•377234 

.371543 

•353 

1.05253 

.57426 

.228391 

•647 

.148 

.379927 

•371164 

.352 

1.05444 

•57278 

.228096 

.648 

.149 

.382622 

•370783 

.351 

1.05636 

• 57 I 3 I 

.227799 

.649 

.is® 

.385320 

.370399 

350 

1.05828 

.56984 

.227500 

.650 

.151 

.388022 

.370012 

.349 

1.06021 

•56837 

.227199 

.651 

.152 

.390726 

.369623 

.348 

1.06214 

.56691 

.226896 

.652 

•153 

.393433 

.369231 

•347 

1.06407 

•56544 

.226591 

•653 

•154 

.396142 

.368836 

.346 

1.06600 

•56397 

.226284 

•654 

•155 

.398855 

.368439 

•345 

1.06794 

.56250 

•225975 

•655 

.156 

.401571 

.368038 

•344 

1.06988 

.56103 

.225664 

.656 

.157 

.404289 

.367635 

•343 

1.07182 

•55957 

.225351 

•657 

,158 

.407011 

.367230 

.342 

1.07377 

.55810 

.225036 

.658 

•159 

.409735 

.366821 

• 34 i 

1.07572 

■55663 

.224719 

•659 

.160 

.412463 

.366410 

•340 


mm 


.660 

m 



| 




■ 


















STATISTICAL TABLES 

TABLE XV C (CONTINUED) 


.412463 .366410 


.415194 

.417928 

.420665 

*423405 

.426148 

.428895 

.431644 

•434397 

•437154 


.442676 

•445443 

.448212 

450986 

•453762 

■456542 

•459326 

.462113 

.464904 


! .365996 
.365580 
.365160 

•364738 

.364314 

.363886 

•363456 

.363023 

•362587 

.362149 

.361707 

.361263 

.360817 

.360367 

•359915 

•359459 

-359001 

.358541 

.358077 




.467699 .357611 



.470497 

•473299 

.476104 

.4789H 

.481727 

.484544 

•487365 
.490189 
.4930P8 | 

•4958'5 o j 

.498687 

•501527 

.504372 

.507221 

.510073 

.512930 

•515792 

.518657 

•521527 


.357H2I 

.356670 

.356195 

•355718 

•355237 

•354754 

.354268 

.353780 

.353288 


.352296 

.351796 

•351293 

.350787 

•350279 

.349767 

.349253 

•348736 

.348216 

•347693 ! 


z/q 

z/p 

Pq 

P 

1.07768 

•55517 

.224400 

.660 

1.07963 

.55370 

.224079 

.661 

i.o8r6o 

.55224 

•223756 

.662 

1.08356 

.55077 

.223431 

.663 

1-08553 

•54930 

.223104 

.664 

x. 08750 

•54784 

.222775 

.665 

1.08948 

•54638 

.222444 

.666 

1.09146 

•54491 

.222111 

.667 

X. 09344 

•54345 

.221776 

.668 

1-09543 

•54198 

.221439 

.669 

1.09742 

.54052 

.221100 

.670 . 

1. 0994 1 

•53906 

.220759 

.671 

1.10141 

•53759 

.220416 

•.672 

1. 1 0342 

•53613 

.220071 

•673 

1. 1 0542 

•53467 

.219724 

•674 

1. 10743 

•53321 

•219375 

•675 

1. 1 0944 

•53174 

.219024 

.676 

1. 11146 

.53028 

.218671 

.677 

1.11348 

.52882 

.218316 

.678 

I-II550 

■52736 

.217959 

.679 

I-H753 

.52590 

.217600 

.680 

I * II 957 

•52444 

.217239 

.681 

1.12160 

.52298 

.216876 

.682 

1.12364 

•52152 

.216511 

.683 

1.12569 

.52006 

.216144 

.684 

1. 12774 

■51859 

•215775 

.685 

1. 12979 

•5I7I3 

.215404 

.686 

1.13185 

•51567 

.215031 

.687 

I-I339I 

.51422 

.214656 

.688 

1. 13597 

•51275 1 

,2'i42 79 

.689 

1.13804 

.51129 

.213900 

.690 [ 


I.I40I2 

I.I42I9 

1.14428 

I.I4636 
I.I4846 
I- 15055 

1.15265 
i- 15475 
1.15686 

I.I5898 


•2I35I9 

.213136 

.212751 

.212364 

.211975 

.211584 

.2X1191 

.210796 

•210399 



00000 + 












1 NORMAL FUNCTIONS 

645 

1 _ TABLE XV C 1 CONTI NU ED) 

; III 


>524401. -347693 

.527279 -347167 
.530161 .346638 

.533048 .346107 


535940 

538836 

541737 

544642 

547551 

550466 

553385 

556308 

559237 

562170 

565108 

568051 

570999 

573952 

576910 

579873 i 


-345572 

.345035 

-344494 

-343951 

.343405 

.342856 


•341749 

•341191 

-340631 

.340067 

•339500 

.338931 

•338358 

.337783 

.337205 


.336623 


585815 .336039 
588793 .335452 
591777 .334861 

594766 .334268 
597760 .333672 
600760 .333073 

603765 .332470 
606775 .331865 

609791 .331257 

612813 .330646 


615840 

618873 

621912 

,624956 

,628006 

,631062 

,634124 

637192 

,640265 


.330031 

.329414 

.328793 

.328170 

.327544 

.326914 

.326281 

.325646 

.325007 


643345 >324365 


1.15898 

1.16109 

1.16321 

1.16534 

1.16747 

1.16961 

I.I7I75 

1.17389 

1.17604 

1.17820 

1.18036 

1.18252 

1.18469 

1.18687 

1.18904 
1.19123 
1. 19342 

1.19561 
1.19781 
x. 20002 


1.20444 

1.20666 

1.20888 

1.21112 

I.2I335 

I.2I559 

1.21784 

1.22009 

1.22235 

1.22461 

1.22688 

1.22916 

I.23H3 

1.23372 

1.23601 

1..2383X 

1.24061 

1.24292 

1.24524 

1.24756 


.209599 

.209196 

.208791 

208384 

.207975 

.207564 

.207151 

.206736 

.206319 

.205900 

.205479 

.205056 

.204631 

.204204 

.203775 

.203344 

.2029 IX 
.202476 
.202039 


.201159 

.200716 

.200271 

.199824 

.199375 

.198924 




.193831 

•193356 

.192879 












STATISTICAL TABLES 
TABLE XV C i continued) 




643.145 -324365 


646431 

649524 

,652622 

655727 

,658838 

.661955 

.665079 

.668209 

.671346 


.674490 


.677640 

.680797 

.683961 

.687131 

.690309 

•693493 

.696685 

.699884 

.703090 


.323720 

.323072 

.322421 

.32x767 

.321110 

.320449 

.319786 

.319x19 

.318449 


.317101 

.316421 

■315739 

•315053 

•314365 

•313673 

.312978 

.312279 

•311578 


706303 .310873 


709523 .310165 
712751 .309454 
715986 .308740 

719229 .308022 
752479 .307301 

725737 -306577 

729003 .305850 
732276 .305119 
735558 .304385 


738847 .303648 



742144 

745450 

748763 

752085 

755415 

758754 

762101 

765456 

768820 


JDOOOOl 

r n 


.302908 

.302164 

.301417 

.3O0666 

.299913 

•299155 

•298395 

.297631 

.296864 

.296094 


z/q 

z/p 

Pq 

1.24756 

•43833 

.192400 

I .24988 

.43687 

.191919 

1.25222 

• 4354 1 

.191436 

1-25456 

•43394 

.190951 

I. 2569 O 

•43248 

.I 9 O 464 

1.25925 

.43102 

•189975 

I. 26 l 6 l 

.42956 

.I 89484 

1.26398 

. .42809 

.I 8899 I 

1.26635 

.42663 

.I 88496 

1.26872 

. 425 x 7 

.187999 

I. 27 III 

.42370 

.187500 

1.27350 

.42224 

.I 86999 

1.27589 

.42077 

.I 86496 . 

1.27830 

• 4 I 93 I 

.185991 

1.28070 

.41784 

.185484 

. 1.28312 

.41638 

•184975 

1.28554 

.41491 

.I 84464 

1.28798 

.41345 

.183951 

1. 29041 

.41198 

.183436 

I .29285 

.41051 

.182919 

I. 2953 I 

.40904 

.182400 

I .29776 

•40758 

.181879 

1.30023 

.40611 

.181356 

1.30270 

.40464 

.180831 

1.30518 

•40317 

.180304 

1.30767 

.40170 

•179775 

I. 3 IOI 6 

.40023 

.179244 

1.31266 

-39876 

.I 787 II 

I- 3 I 5 I 7 

•39729 

.I 78176 

1.31768 

•39582 

.177639 

1. 3202 1 

.39435 

.177100 

1.32274 

.39288 

•176559 

1.32528 

■39140 

.176016 

1.32783 

•38993 

j -I 7547 X 

1.33038 

.38846 

.174924 

1-33294 

.38698 

-X 74375 

I- 3355 I 

■38551 

■ - 173824 - 

1.33809 

•38403 

.173271 

1.34068 

.38256 

.172716 

1-34328 

.38108 

.172159 

1.34588 

.38069 

.171600 




















NORMAL FUNCTIONS 
TABLE XV C ‘ continued) 


.772193 .296094 .220 


•775575 I 

.778966 

.782365 

785774 

.789192 

.792619 

.796055 

.799501 

.802956 



.295320 .219 

.294542 .218 

.293762 .217 

.292978 .216 

.292:90 .215 

.29X399 .214 

.290605 .213 
.289807 .212 

.289OO6 .211 


.806421 .288201 


.809896 

.813380 

.816875 

.820379 

.823894 

.827418 

.830953 

.834499 

.838055 


.845199 

.848787 

.852386 

.855996 

.859617 

.863250 

.866894 

.870550 

.874217 


.287393 

.286582 

.285766 

.284948 

.284126 

.283300 

.282471 

.281638 

.280802 

.279962 

.279II8 

.278272 

.277421 

.276567 

.275709 

.274847 

.273982 

.273114 

,272241 


z/q 

z/p 

1-34588 

.38069 

I .*34849 
i .35 in 
1-35374 

■37813 

.37665 

.37517 

1.35638 

.37370 


1.36168 

.37074 

1.36434 

1.36701 

1.36970 

.36926 

■36778 

.36629 

1.37239 

.36481 

x . 37509 

•36333 


1.38051 

1.38324 

1.38598 


1.39425 | 

1.39702 

1.39981 

I. 4026 O 
1. 40541 
I .40823 

I. 4 IIO 6 

1.41389 

I.41674 

I. 4196 O 

I.42247 

1.42535 



.877896 .271365 .190 1 42824 


.881587 
.885291 
. 889 OO 6 j 

.892733 

.896473 

.900226 

.903991 

.907770 

.911561 


.270486 .189 

.269602 .l88 

.268715 .187 

.267824 .186 

.266929 .185 

.266031 .184 

.265129 .183 

.264223 .182 

.263313 - lSl 


1 . 43 IH 
1.43405 
x. 43698 

i -4399 1 
1.44286 
1.44582 

1.44879 

I- 45 I 77 

1.45477 


.915365 .262400 j .180 I 145778 


[61791 

. 161196 
.160559 

.160000 | .800 

.159399 
.158796 
.158191 

.157584 
.156975 

.156364 

.155751 
.155136 
.154519 

.153900 

.153279 
.152656 
.152031 

.151404 

.150775 

.150144 

.149511 

.148876 

.148239 

.147600 
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TABLE XV C (CONTINUED) 


I 


z 

2 ■ 

z/q 

z/p 

pq 

p 

.320 

•915365 

.262400 

.180 

145778 

.32000 

.I476OO 

.820 

• 3 21 

.919183 

.261483 

.179 

I.4608O 

.31849 

.146959 

.821 

.322 

.923014 

.260562 

.178 

1.46383 

•31699 

.I46316 

.822 

• 3 2 3 

.926859 

■259637 

.177 

I.46688 

•31548 

.145671 

.823 

. 3 2 4 

.930717 

.258708 

.176 

1.46993 

•31397 

.145024 

.824 

*325 

•934589 

•257775 

•175 

1.47300 

•31245 

•H 4375 

.825 

.326 

.938476 

.256839 

.174 

1.47609 

.31094 

.143724 

.826 

•327 

.942376 

.255898 

.173 

147918 

•30943 

.143071 

.827 

.328 

.946291 

•254954 

.172 

1.48229 

.30792 

.I42416 

.828 

•329 

.950221 

.254006 

.171 

148541 

.30640 

.141759 

.829 

330 

•954165 

•253054 

.170 

1.48855 

.30488 

.141100 

.830 

• 33 i 

•958125 

.252097 

.169 

1 49 1 70 

•30337 

.140439 

.831 

•332 

.962099 

.251137 

.168 

I.49486 

.30185 

•139776 

.832 

•333 

.966088 

.250173 

.167 

I.49804 

.30033 

.I39III 

•833 

-334 

•970093 

.249205 

.166 

1-50123 

.2988 r 

.138444 

■834 

•335 

•9741 H 

.248233 

.165 

1.50444 

.29728 

•137775 

•835 

•336 

.978150 

.247257 

.164 

1.50766 

•29576 

.137104 

.836 

•337 

.982203 

.246277 

.163 

i. 5 1090 

.29424 

•136431 

•837 

-338 

.986271 

.245292 

.162 

1-51415 

.29271 

-135756 

.838 

•339 

•990356 

.244304 

.161 

1.51742 

.29118 

•135079 

•839 

.340 



.160 

1.52070 

.28966 

.134400 

.840 

• .341 

.998576 

•242315 

.159 

1.52399 

.28813 

.133719 

.841 

•342 

1.002712 

.241315 

.158 

I- 5273 I 

.28660 

.133036 

.842 

-343 

1.006864 

.240310 

•157 

1.53064 

.28507 

•132351 

•843 

•344 

1.011034 

.239301 

.156 

1.53398 

•28353 

.131664 

•844 

-345 

1.015222 

.238288 

•155 

1-53734 

.28200 

.130975 

•845 

•346 

1.019428 

.237270 

.154 

1 . 54071 

.28046 

.130284 

.846 

•347 

1.023651 

.236249 

•153 

I- 5441 1 

.27892 

.129591 

•847 

•348 

1.027893 

.235223 

•152 1 

x . 54752 

•27739 

.I28896 

.848 

•349 

1. 032 1 54 

•234193 

ms 

x -55095 

■27585 

.128199 

•849 

•350 

1-036433 

.233159 

Es 

1 - 55439 , 

.27430 

.127500 

.850 

• 35 i 

1.040732 

.232120 

.149 

1-55785 

.27276 

.126799 

.851 

•352 

1.045050 

.231077 

.148 

1-56133 

.27122 

.126096 

.852 

•353 

1.049387 

.230030 

.147 

1-56483 

.26967 

.125391 

853 

•354 

1-053744 

.228979 

.146 

1-56835 

.26813 

.I24684 

•854 

•355 j 

1.058122 

.227923 

.145 

1.57188 

.26658 

® 123975 

•855 

•356 

1.062519 

.226862 

.144 

1-57543 

.26503 

.123264 

.856 

•357 

1.066938 

.225798 

.143 

x. 579 °i 

.26347 

.122551 

•857 

•358 

1.071377 

.224728 

.142 

1-58259 

.26192 

.121836 

.858 

•359 

1-075837 

•223655 

.141 

1.58621 

.26037 

.121119 

•859 

.360 

1.080319 

.222577 

.140 

1-58983 

.25881 

.120400 

.860 

e" 

.0000006 

.000002 

• 000001 


,00000+ , 

.00000 + 

. 000000 + 

















NORMAL FUNCTIONS 
TABLE XV C (continued) 


X 

z 

1.0803x9 

.222577 

1.084823 

1.089349 

1.093897 

1.098468 
I. x 03063 
1.107680 

1.112321 

1.1x6987 

1.12x676 

.221494 

.220407 

•219315 

.218219 

.217119 

.216013 

.214903 

.2x3789 

.212669 

1.126391 

.211545 


371 *1.131131 

372 1.135896 
37 3 1.140688 

374 1. 145505 

375 1. 150349 

376 1.155221 

3 77 1.160120 

378 1.165047 

379 1. 1 70002 

380 1.174987 

381 1. 1 80001 

382 1.1.85044 

383 1. 190118 

384 1. 195223 

385 1.200359 

,386 1.205527 

.387 1. 210727 

.388 1.215960 

.389 1. 221227 


.210416 

.209283 

.208145 

.207001 

.205853 

.204701 

.203543 

.202380 

.201213 


1. 1 80001 .198863 

1.1.85044 .197680 
1. 190118 .196493 

1. 195223 .195300 
1.200359 .194102 

1.205527 .192900 

1. 210727 .191691 
1.215960 .190478 
1. 221227 .189259 

1.226528 .188036 


.391 1.231864 

,392 1.237235. 

393 1.242642 

394 1.248085 

395 1.253565 

396 1.259084 

397 1.264641 

,398 1.270237 

399 1-275874 


.186806 

.185572 

.184332 

.183087 

.181836 

.180579 

.179318 

.178050 

.176777 


1.281552 , .175498 



1.59348 

1-59715 

1.60084 

1.60455 

1.60829 

1.61204 

1.61581 

1.61961 

1.62343 

1.62727 

1.63113 
x. 63502 
I.63894 

X. 64286 
1.64683 
1.65081 

1.65482 

1.65885 

1.66292 

1.66700 

1.67112 

1.67525 

1.67943 

1.68362 

1.68785 

1.69211 

1.69638 

1.70070 

X. 70504 

X.7094X 

1.71382 

1.71826 

1.72273 

1.72724 

1.73177 

1.73634 

1,74095 

1-74559 

1.75027 

1-75498 


119679 

118956 

II 823 X 


IX53H 
X 14576 
1x3839 


110124 

169375 

108624 

107871 

107116 

106359 


10023 I 
099456 
098679 


0971 19 
096336 

095551 

094764 

093975 

093184 

092391 

091596 

090799 


.19500 ! .090000 I .900 


,ooooo + Looqooo + 
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TABLE XV C C CONTI NUED) 





X 

0 

1.281552 

■175498 

1.287271 

1.293032 

1.298837 

x. 304685 

1.3x0579 

1.3x6519 

1:322505 

1.328539 

1.334622 

.174214 

.172924 

.171628 

.170326 

.169018 

.167705 

.166385 

.165060 

.163728 

1.340755 

.162391 

1-348939 

1.353174 

1.359463 

1.365806 

1.372204 

1.378659 

1.385172 

1.391744 

1.398377 

.161047 

•159697 

.158340 

.156978 

•155609 

■154233 

■152851 

•151463 

.150068 

1.405072 

.148666 

1.411830 

1.418654 

1.425544 

143250 3 
i x. 43953 x 
1.446632 

1.453806 

1.461056 

1.468384 

I-47579X 

.147258 

■145843 

.144420 

.142991 

.14X555 

.140112 

.138662 

.137205 

.135740 

.134268 

1.483280 

1.490853 

14985x3 

1.506262 

1.514102 

1.522036 

1.530068 

1.538199 

I.546433 

.132788 
! .131301 
.129807 

,128304 

.126794 

.125276 

.123750 

.122216 

.120674 

1.554774 

.XI9I23 




.089 I.8095 

.088 1.8147 

.087 1.8200 


.083 1.8416 

.082 1.8471 

.081 1.8527 


.079 1.8640 

.078 1.8698 

.077 1.8756 

.076 1.8815 

.075 1.8874 

.074 1.8934 

*073 1.8995 

.072 1.9056 

.071 1.9118 


089199 

088396 

087591 

086784 

085975 

085164 

084351 

083536 

,082719 

,081900 

.081079 

.0802^ 

.079431 

.078604 

■077775 

.076944 

.076111 

.075276 

.074439 

.073600 

.072759 

.071916 

.071071 

.070224 

•069375 

.068524 

.067671 

.066816 

065959 

.065100 

,064239 

,063376 

,062511 

,061644 

,060775 

,059904 

059031 

.058156 

>057279 

.056400 


OOOO + 
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TABLE XV C (continued) 


I 

X 

Z 

2 

z/q 

z/p 

pq 

P 

.440 

1-554774 

.119123 

.060 

1.9854 

.I2673 

.056400 

• 9 . 4 Q 

.441 

1.563224 

.117564 

.059 

1.9926 

.I2494 

•055519 

.941 

.442 

1.571787 

.115996 

.058 

1.9999 

.12314 

.054636 

.942 

443 

1.580467 

.114420 

•057 

2.0074 

•12134 

•053751 

•943 

444 

1.589268 

.112836 

.056 

2.0149 

•I 1953 

.052864 

•944 

•445 

1-598193 

.111242 

.055 

2.0226 

.11772 

•051975 

‘945 

.446 

1.607248 

.109639 

.054 

2.0304 

.11590 

.051084 

.946 

•447 

1.616436 

.108027 

.653 

2.0382 

.11407 

.050191 

•947 

.448 

1.625763 

.106406 

.052 

2.O463 

.11224 

.049296 

.948 

449 

1-635234 

.104776 

.051 

2.0544 

.IIO4I 

.048399 

•949 

• 4 So 

1.644854 

.103136 

.050 

2.0627 

.IO856 

.047500 

•950 

•451 

1.654628 

.ior486 

.049 

2.07II 

.10672 

.046599 

•951 

452 

1.664563 

.099826 

.048 

2.0797 

.IO486 

.045696 

•952 

453 

1.674665 

.098157 

.047 

2.0884 


.044791 

•953 

454 

1.684941 

.096477 

.046 

2.0973 

.10113 

.043884 

•954 

455 

1 695398 

.094787 

•045 



1 042975 

•955 

456 

1.706044 

.093086 

■044 

2.1156 



•956 

457 1 

1.716886 

•091375 

•043 

2.1250 


.041151 

•957 

458 

1.727934 

.089652 

.042 

2.1346 


.040236 

•958 

459 

1.739198 

.087919 

.041 

2.1444 


•039319 

•959 


.460 

1.750686 

.461 

. 4 h2 

•463 

1.762410 

1.774382 

1.786614 

.464 

•465 

.466 

1.799118 

1.811911 

1.825007 

.467 

.468 

.469 

1.838424 

1.852180 

1.866296 

.470 

1.880794 

.471 

.472 

473 

1.895698 

1.911036 

1.926837 

474 

475 
.476 

I- 943 I 34 

1.959964 

1.977368 

•477 

.478 

■479 

1-995393 

2.014091 

2.033520 

.480 

2-053749 


.086174 


.084417 

.082649 

.080868 

.079075 

.O7727O 

.075452 

.073620 

.071775 

.069915 


.068042 


.066154 

.064250 

.062332 

.060397 

.058445 

.056476 

.O5449O 

.052485 

.050462 


.O48418 



.960 j 

.961 j 

,962 | 

.963 

.964 

.965 

.966 

.967 

.968 

.969 

.970 

.971 

.972 

•973 

•974 

•975 

.976 

•9 77 
.978 
.979 
.980 



. 00003 
OOOOOO+ 

ET U 


. 0000+ . 00 
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TABLE XV C ( CONTINUED) 


I 

X 

Z 

q 

z/q 

z/p 

pq 

p 

.480 

2.053749 

.048418 

.020 

2.4209 

.O494I 

.019600 

.980 

.481 

2.07485s 

•046354 

.019 

24397 

.04725 

.018639 

.981 

.482 

2.096927 

.044268 

.018 

2 4593 

.04508 

.017676 

.982 

•483 

2.120072 

.042160 

.017 

2.48OO 

.O4289 

.016711 

•983 

.484 

2.144411 

.040028 

.016 

2.5018 

.04068 

.015744 

.984 

•485 

2.170090 

.037870 

.015 

2.5247 

■03845 

.014775 

•985 

.486 

2.197286 

.035687 

.014 

2.5491 

.03619 

.013804 

.986 

.487 

2.226211 

•033475 

.013 

2.5750 

.03392 

.012831 

•987 

.488 

2.257129 

.031234 

.012 

2.6028 

.03161 

.011856 

.988 

•489 

2.290370 

.028960 

.Oil 

2.6327 

.02928 

.010879 

•989 

.490 

2.326348 

.026652 

.010 

2.665 

.02692 

.009900 

.990 

.491 

2.365618 

.024306 

.009 

2.701 

.02453 

.008919 

.991 

.492 

2.408916 

.021920 

.008 

2.740 

.02210 

.007936 

.992 

493 

2457264 

.019487 

.007 

2.784 

.OI962 

.006951 

*993 

494 

2.512144 

.017003 

.006 

2.834 

.01711 

.005964 

.994 ! 

495 

2.575829 

.014460 

.005 

2.892 

*01453 

.004975 

.995 

496 

2.652070 

.011847 

.004 

2.962 

.OII89 

.003984 

.996 

497 

2.747781 

.009149 

.003 

3-050 

.OO918 

.002991 

.997 

498 

2.878161 

.006340 

.002 

3.170 

.OO635 

.001996 

.998 

499 

3.090229 

.003367 

.001 

3-367 

-00337 

.000999 

.999 

E ' ] 

.0005 

. 000005 





■ 

p\ i ? 

.00005 







jQflILQJL 

_fi±L 








SECTION 4. SQUARE AND CUBE ROOTS 

The methods given in Chapter XIII, Section 13, 
for extracting square and cube roots on a compu- 
ting machine are particularly expeditious when a 
good first approximation is started with. The 
two-figure argument table herewith provides such, 
especially if the approximate root is gotten by 
linear interpolation. 














SQUARE AND CUBE ROOTS 65 3 

TABLE XV D 


SQUARE AND CUBE ROOTS 


N 

sw 

/m 

3 /r 

2ToT 

hooiv 

1.0 

1.00000 

3. 16228 

1. 00000 

2. 15443 

4.64159 

S. ! 

1 .0*488 1 

3.31662 

1.03228 

2. 22398 

4. 79 1 42 

1.2 

1.09 5 45 

3.46410 

1. 06266 

2. 28943 

4. 93242 

1.3 

I. 14018 

3.60555 

1.09 139 

2.35133 

5. 06580 

1.4 

1. 18322 

3.74166 

1. 11869 

2.41014 

5. 19249 

1.5 

1.22 474 

3. 87298 

1. 1447 1 

2. 46621 

5.31329 

1. 6 

1. 26491 

4. 00000 

i. 16961 

2.51984 

5. 42884 

1.7 

1.30384 

4. 123 II 

1. 19348 

2. 57128 

5.53 966 

1 . 8 

1.34164 

4.24264 

1.2 1644 

2.62074 

5.64622 

1.9 

1.37840 

4.35890 

1.23856 

2.66840 

5.74890 

2® 0 

1. 41421 

4.47214 

1. 25992 

2.71442 

5.84804 

2. 1 

1.44914 

4. 58 258 

I. 28058 

2. 75892 

5.9 4392 

2.2 

1.48324 

4.69 042 

1.30059 

2.8 0204 

6. 03681 

2.3 

1.51658 

4. 79583 

1.32001 

2. 84387 

6. 12693 

2.4 

1.54919 

4.89898 

1.33887 

2.88450 

6.21447 

2.5 

1. 58114 

5.00000 

1.35721 

2.9 2402 

6. 2996 ji 

2. 6 

1.61245 

5. 09902 

1.37507 

2.96250 

6.38250 

2.7 

1.643 17 

5. 19615 

1.39248 

3.00000 

6. 46330 

2.8 

1.67332 

5.29150 

1. 409 46 

3.03659 

6.54213 

2.9 

1.70294 

5.38516 

1.42604 

3 . 072 32 

6.6191 1 

3.0 

1.73205 

5.47723 

1. 44225 

3. 10723 

6-69 433 


MAXIMUM INTERPOLATION ERRORS IN THE 
NEIGHBORHOOD OF THE INDICATED (V'S 



.00031 

. 00099 

. 00028 

. 00060 

.00130 


. 00002 

. 00007 

.00002 

. 00004 

. 00020 

fE !! 

.-000 1 1 

.00035 

. 00009 

.00019 

.00041 

2 *°V n 


.00001 


.00001 

. 00002 

o rvf E " 

. 00007 

. 00020 

.00005 

.00010 

.00022 

i 1 





.00001 
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TABLE XV D (Continued! 


SQUARE AND CUBE ROOTS 






^ 

3 

N 

wr 

/io N 

W 

7 ION 

rmr 

3.0 

1.73205 

5.47723 

1 . 44225 

3. 10723 

6.69433 

3. 1 

1.76068 

5.56776 

1.45810 

3. 14138 

6, 76790 

3.2 

1.78885 

5.65685 

1.47361 

3. 17480 

6.83990 

3.3 

1.81659 

5.74456 

1.4888 1 

3. 20753 

6.91042 

3.4- 

1.84391 

5.83095 

1. 50369 

3. 2396 1 

6.97953 

3.5 

1.87083 

5.9 1608 

1.51829 

3.27107 

7. 04730 

3.6 

1.89737 

6. 00000 

1.53262 

3.30193 

7. 11379 

3.7 

1,9 2354 

6.08 276 

1.54668 

3.33222 

7. 179 05 

3.8 

1.94936 

6 . 1.6441 

1.56049 

3.36198 

7. 243 1 6 

3.9 

1. 97 484 

6 . 24500 

1. 57406 

3.39121 

7.30614 

4.0 

2. 00000 

6.32456 

1.58740 

3.41995 

7.36806 

4. 1 

2.0 2 485 

6.40312 

1.60052 

3.44822 

7.42896 

4.2 

2. 04939 

6.48074 

1.61343 

3. 47603 

7.48887 

4.3 

2.07364 

6 . 5574 4 

1.62613 

3.50340 

7. 5478 4 

4.4 

2.09762 

6. 63325 

1.63864 

3.53035 

7.60590 

4.5 

2. 12132 

6.70820 

1.65096 

3.55689 

7. 66309 

4.6 

2. 14476 

6.78233 

1.66310 

3. 58305 

7.71944 

4.7 

2. 16795 

6.85565 

1.67507 

3.60883 

7.77498 

4.8 

2. 19089 

6.92820 

1.68687 

3.63424 

7.82974 

4.9 

2.21359 

7. 00000 

1.69850 

3.65931 

7.88374 

5.0 

2. 2360 7 

7. 071 07 

1. 70998 

3.68403 

7. 93701 


MAXIMUM INTERPOLATION ERRORS IN 
THE NEIGHBORHOOD OF THE I NDIC ATED N' S 


V ‘ 

.00007 

.00020 

.00005 

. 000 10 

.00022 

3 ’ < V" 





. 00001 

4 . 0,E ] ' 

.00004 

.00008 

. 00003 

. 00006 

.00013 

5.0, E ' ' 

. 00003 

. 00009 

. 00002 

. 00004 

.00009 
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TABLE XV D (Continued) 


655 


SQUARE AND CUBE ROOTS 


N 

sr 

/ 10 F 

w 

fam 



2.23 607 

7.07107 

1.7099 8 

3 . 68403 

7.93701 

5 . 1 

2.25832 

7 . 14143 

1.72130 

3 . 708 43 

7.98957 

5.2 

2.28035 

7.21 ISO 

1.73248 

3.73251 

8. 04 1 45 

5.3 

2.30217 

7 . 28011 

1.74351 

3 . 75629 

8.09267 

5.4 

2.32379 

7.34847 

1.75441 

3 . 77976 

8 . 14325 

5.5 

2 . 3452 1 

7.41620 

1.76517 

3.80295 

3. 19321 

5.6 

2.36643 

7 . 48331 

1.77581 

3 . 82586 

8.24257 

5.7 

2.38747 

7 . 54983 

1.78632 

3 . 848 50 

8.29 134 

5.8 

2.40832 

7.61577 

1 . 79670 

3.87088 

8 . 33955 

5.9 

2 . 42899 

7.681 15 

1.80697 

3.89300 

8.38721 

6.0 

2 . 44949 

7.74597 

1.81712 

3.91487 

8.43433 

6 . 1 

2 . 46982 

7.81025 

1.82716 

3.93650 

8.48093 

6.2 

2.48998 

7.87401 

1.83709 

3.95789 

8.52702 

6.3 

2.509 98 

7.93725 

1.84691 

3.97906 

8.57262 

6.4 

2.52982 

8.00000 

1.85664 

4.00000 

8.61774 

6.5 

2.54951 

8.06226 

1.86626 

4 . 02073 

8.66239 

6 . 6 

2 . 56905 

8 . 12404 

1.87578 

4.04124 

8.70659 

6.7 

2 . 58844 

8 . 18 535 

1.88520 

4.06155 

8.75034 

6.8 

2 . 60768 

8.24621 

1.89454 

4 . 08166 

8 . 79366 

6.9 

2 . 62679 

8.30662 

1.90378 

4 . 10157 

8.83656 

7.0 

2.64575 

8.36660 

1.9 1293 

4 . 12129 

8. 87904 


MAXIMUM INTERPOLATION ERRORS IN 
THE NEIGHBORHOOD OF THE I NDICATED N ' S 


5 . 0 ,£' ’ 

.00003 

. 00009 

.00002 

. 00004 

.00009 

6.0, E ' ' 

. 00002 

.00007 

.00001 

. 00003 

. 00007 

7 . 0 , E " 

.00002 

.00005 

.00001 

. 00002 

. 00005 


| 
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TABLE XV D (Continued) 
SQUARE AND CUBE ROOTS 


A 

/r 

/10 N 

W 

Him 

mu 

7.0 

2.6 4575 

8.36660 

I . 9 S 293 

4 . 12129 

8.87904 

7 . 1 

2.66458 

8.42615 

1.92200 

4 . 14082 

8.92112 

7.2 

2. 68328 

8.48528 

1.93098 

4 . 16017 

8.96281 

7.3 

2.70 185 

8.54400 

1.93988 

4 . 17934 

9.0041 1 

7.4 

2.72029 

8.60233 

1.94870 

4 . 19834 

9.04504 

7.5 

2.73861 

8.66025 

1.95743 

4.21716 

9.08560 

7.6 

2.75681 

8.71780 

1.96610 

4.23582 

9 . 1258 i 

7.7 

2.77489 

8.77496 

1.97468 

4.25432 

9 . 16566 

7.8 

2.79285 

8.83176 

1.98319 

4 . 27266 

9.20516 

7.9 

2.81069 

8.88819 

1.99163 

4 . 29084 

9. 24434 

8.0 

2.82843 

8.94427 

2.00000 

4.30887 

9.28318 

8 . 1 

2.84605 

9.00000 

2. 00830 

4.32675 

9.32170 

8.2 

2.86356 

9.05539 

2.01653 

4.34 448 

9.35990 

8.3 

2.88097 

9 . 11043 

2.02469 

4.36207 

9.39780 

8.4 

2. 89828 

9 . 165 15 

2 . 03279 

4.37952 

9. 435 39 

8.5 

2.91548 

9.21954 

2. 04083 

4 . 39683 

9.47268 

8.6 

2.93258 

9.27362 

2.04880 

4 . 41400 

9.50969 

8.7 

2 . 949 58 

9.32738 

2.05671 

4.43 105 

9 . 5 46 40 

8.8 

2.96648 

9.38083 

2.06456 

4 . 44796 

9.58284 

8.9 

2.98329 

9 . 43398 

2.07235 

4 . 46475 

9.61900 

9.0 

3.00000 

9 . 48683 

2 . 08008 

4.48140 

9.65489 

9 . 1 

3.01662 

9.53939 

2 . 08776 

4.49794 

9.69052 

9.2 

3.03315 

9.59 166 

2 . 09538 

4.51436 

9.72589 

9.3 

3.04959 

9.64365 

2 . 10294 

4.53065 

9.76100 

9.4 

3.06594 

9.69536 

2 . 1 1045 

4.54684 

9. 79586 

9.5 

3.08221 

9 . 74679 

2 . 1179 1 

4.56290 

9.83048 

9.6 

3. 09839 

9 . 79796 

2 . 12532 

4.57886 

9.86485 

9.7 

3 . i 1 448 

9.84886 

2 . 13267 

4 . 59470 

9.89898 

9.8 

3 . 13050 

9.89949 

2 . 13997 . 

4.6 1044 

9.93288 

9.9 

3 . 14643 

9.94987 

2 . 14723 

4. 62607 

9.96655 

16.0 

3 . 16228 

10. 00000 

2 . 15443 

4.64159 

JO . 00000 

7 . 0 , 5 ' ' 

. 00002 

.00005 


. 00002 

. 00005 

8 . 0 , E ! ' 

. 0000 1 

.00004 


.00002 

.00004 

9 . 0 , 5 ' ' 

. 0000 1 

.00004 


.00001 

. 00003 

10.0 5 " 

. 0000 1 

. 00003 


.00001 

.00003 
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section 5 . SHORTER MATHEMATICAL TABLES LISTED 
ACCORDING TO OCCURRENCE IN EARLIER CHAPTERS 

Page 

Table IV M. Hie Number of Classes to Use 133 
for Graphic Portrayal of Samples of 
Size N Drawn from a Normal Population 

Table VI G. Ratios of Certain Measures 232 
of Variability to Their Standard Errors 
in the Case of Samples Drawn from a 
Normal Population 

Table VII A, Relative Frequencies in a 251 
Pearson Type III, /3^-l, /3 2 — 4*5, dis- 
tribution, for 2 cr intervals 

Table VII D. Optimal Interval, in Terms 259 
of the Population cr, for the Computa- 
tion of the Parabolically Smoothed Mode, 
for Samples of Size N Drawn from a Nor- 
mal Population 

Table VII G. Distribution of 1000 Meas- 270 
ures the Logarithms of which Yield a 
Normal Distribution 

Table VIII C. Normal Probabilities 293 

Table IX E. Table of 8 Values (for norm- 326 
alizing a variance ratio) 

Table X C ♦ Corrections When Computing p 369 
From Tied Ranks 

Tables X H and X I. Normal Bivariate Dis- 390 
tribution, in which r = .80, in Case 
of Coarse Grouping 

Table XI A. Non-Chance Differences for 418 
Different Ratios, cr (due to chance) 
cr (observed) 
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Page 

Table XI EL Weighting Factors Consequent 424 
to Values of the Reliability Coeffi- 
cient 

Table XI D. Relationship Between Cor re- 430 

lation and an Imposed Curtailment of 
one of the Variables 

Table XII A. Symbolic Expression of Con- 462 
stants Derived in Modified Doolittle 
Solution 4 Variables 

Tables XIII D and XIII E. Pearson Curve 511 
Types 

Tables XIII F. The Equi -Probable Range 531 
and the Mean Range, of Samples Drawn 
from a Normal Population (in Terms of 
the Population cr) 

Table XIII H. The Size (in a units) of 538 
the Interval Giving a Probability of 
.5 that the Frequency of the Median 
Interval Shall Exceed that of Either 
Neighboring Interval, in the Case of a 
Normal Distribution; the Expected Range 
(in cr units); and the Number of Classes 
Necessary to Cover this Range 

Table XIII I. Notation for Arguments, 540 
Tabled Entries and Differences 

Table XIV A, Basic Factorials and Gamma 587 
Functions 

Table XIV B. Of x = 2 sin' 1 Vf 
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APPENDIX A 

MATHEMATICAL BACKGROUND TEST 


SECTION i. THE BACKGROUND TEST 

Herein is given a "Background Test for Ele- 
mentary Statistics, " The purpose of the test is 
threefold, (a) to enable a student to secure an 
idea as to whether his mathematical background 
is adequate for the pursuit of an elementary 
course in statistics, (b) to inform him of short- 
comings in his background, if they exist, which 
he should remedy by study of the appropriate 
elementary topics as found in arithmetics and 
elementary algebras, and (c) to enable an in- 
structor to classify his students and properly 
adapt instruction to them. The two first men- 
tioned purposes may be realized by the student 
himself without the aid or knowledge of the in- 
structor. To secure information (a) it is neces- 
sary that the student take the test in an en- 
tirely fair manner. He must observe time limits 
which, however, are not short. He must not coach 
himself upon the items of the test by consulting 
others who know the content of the test, or by 
reference to the scoring key given on pages fol- 
lowing the test. He must be rigorous in marking 
his own paper. It may happen that the student's 
answer will be different from that given in the 
key, which is the sole guide for marking papers, 
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but different in a manner which the student real- 
izes, though none other would, is not due to a 
faulty understanding. In this case, he should 
mark his answer wrong In spite of his personal 
knowledge that he correctly understands the prob- 
lem and. its solution. If he is not thus objec- 
tive and rigorous in the grading of his own paper, 
he cannot interpret his position by comparing 
his score with the percentile scores given. 

Page 1 of the test, calling for certain per- 
sonal information about the student, provides a 
second somewhat different and somewhat less rel- 
iable means of estimating a student's prepara- 
tion for pursuing elementary statistics. Items 
a, c, d , e, and f are probably pertinent but are 
not involved in the experience score as calcula- 
ted from this page. Their inclusion on the page 
will help the student to a rounded out picture 
of himself and of his equipment. The items which 
are scored and combined into an experience score 
are X 1? X 2 , X ^ as explained herewith. 

is a score based on item (b ) , year In col- 
lege, according to the following schedule: 


X 1 score 
1 

2 

3 

4 

5 

6 
7 


Year in col lege 

Freshman 
Sophomore 
Jun ior 
Senior 

First year of graduate study 
Second year of graduate study 
Third year of graduate study 


X 2 is a score based on item (4 ) , the most ad- 
vanced course in mathematics successfully pur- 
sued, according to the following schedule: 

X x score Most advanced course in mathematics 

8 Arithmetic, i . e. , no algebra 

9 1/2 year or i year of High School 

algebra 
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10 

Other high school mathematics, in- 
cluding a second course in algebra, 
trigonometry or geomet ry. 

II 

College algebra, or answer omitted 

12 

Calculus 

13 

One course beyond calculus 

1.4 

Two or more courses beyond calculus 

X 2 is a score based on related courses and 

quality of work according to the following sched- 
ule: 

•*»* 

/ 

% 

- 1 for mark of D or F 

item (h) credit 

0 for mark of C or 1 or no answer 


1 for mark of A or B 


0 for no course 

Item (i) credit 

1 for one or more courses ( or un- 
doubted equivalent in research work) 
V. 

l 

-1 for mark of D or F 

Item (j) credit \ 

0 for mark of C or 1 or no answer 

1 for mark of A or B 

k. 

X 2 - sum of credits 

The final experience score X e is given by 

*e = 

2X x + 2X 2 + X 2 


t 

K 
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A BACKGROUND TEST FOR ELEMENTARY STATISTICS 

Parts I, II 

FILL IN FULLY THE INFORMATION ASKED FOR ON THIS PAGE 

(a) Name Institution Datfc 

(b) I f an undergrad uate student , encircle the year in 
col lege. 12 3 4 If a graduate student, encircle the 
year of graduate work. I 2 3 

(c) Cand idate for a degree? Yes_ No If so, for what 

degree?^ 

(d) Degree (or degrees) held institution 

Date — — 

(e) Teach i ng, research , or admi n i st rat i ve pos i t i on now 

held, if any? 


(f) Exper i mental , measurement, and statistical duties 

connected therewith, if any? _ 

Courses and experience connected with measurement, 
and statistics: 

(g) Most advanced course in mathematics pursued 

Date 1 nst i tut ion — 

(h) Using the marking system: A - highest q uarter; B = 
next to highest quarter; C - next to lowest quarter; 
D - lowest quarter; F - failure; I - incomplete, 
course not finished; g i ve yourself a mark to i nd icate 
as nearly as -you can your standing in the course just 
meht ioned 

(i) Course or courses In measurement and statistics 

Date I nst i tut i on 

(j ) As above, for each course, in the order ment ioned, 

give yourself a mark A, B, C , D, F, or I 


SUMMARY 
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The purpose of this test is to enable an instructor of 
statistics to estimate the sort of work which the students 
of his class may pursue profitably, and to enable stu- 
dents of statistics in general to appraise their own 
mathematical equipment. Some students may be prepared 
for advanced work. Therefore, the test contains items 
which many students will be unable to do who are promising 
candidates for an elementary course. Ho student should 
feel discouraged, and each one should do all of those 
items which lie within his capacity. Work as rapidly as 
is consistent with accuracy. 

INSTRUCTIONS TO ADMINISTRATOR OF TEST: 

After 35 minutes working time announce, "Even 
if you have not finished Part T start on Part IT." 

After another 7 minutes, or a total working 
time of 4-2 minutes, call time. 

An intermission i s recommended before starting 
Parts 111, IV, and V. 

SUMMARY 


NO. RIGHT 


WEIGHT, OR 
MULTIPLYING 
FACTOR 


(NO. RT.) X ( WT. ) 


PARTS T AND TI SCORE 
FROM TABLE BELOW 


Crude Total 

IKUBBDBBD 

8 

BIDED 

Equivalent 

Score 

EDBBBEIDBIBBDilS 

Crude Total 


Equivalent 

Score 
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PART I. COMPUTATION 
A. Addition and Subtraction 


Carry all answers to the greatest number of decimal 
places found in the data of the problem, unless otherwise 
specified. Designate the answers clearly. 


1. Add 32.784, 764. 46, S . 2733, .0056. 

2. Add 73.8, -13.4, .632, -.076. 

3. Subtract 17.384 from 8.263 1. 

4. Subtract .0758 from 1.6. 

5. Subtract -.848 from 3 3/4. 


8. Multiplication 


Carry answers to as many decimal places as the sum of 
the decimal places in the two factors. When the common 
fractions given are changed to decimals express them to 
the two nearest decimal places. 


6. Multiply 73-18 by .062. 

7* Multiply 13 l/6 by 74. 

8. Multiply -1.14 by 14.32. 

9. Multiply -.037 by 7/16. 

10. Multiply 8/ 14 by 10/9. (Express answer as a 
decimal to 3 places) 
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C a Division 

C.arry answers to three significant figures. 

!«l 

1 1. Divide . i 145 by 3.76. 

12. Divide 9 7/ 12 by 24.28. 

13. Divide I 3/4 by 72 4/9. 

D. Roots and Powers 

Carry square root answers to as many significant fig- 
ures as given, in the number the square root of which is 
to be extracted. 

14. Extract the square root of 5776. 

15. Extract the square root of .00049. 

16. Extract the square root of -169. 

17. Expand (9) 2 . 

E. Percentage 

Keep answers to the nearest integer. 

18. Find 52 percent of 165. 

19. Find .52 percent of 9S6. 

20. The ratio of 3 to 4 equals the ratio 
of 7.5 to what? 

(Go right on to Part II) 
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PART II. EXPONENTS 

Simplify the following: 

i. M 3 00 2 = II. c* 5 ) 2 = 


2. xV = 12. v / y r = 

3. = 13. = 


4. 2(xV 2 ) 2 = 14. Vx 1 2 y 2 - 


6 . 


9. 


X 5 


x3 

— — 

7 

6 

X 

y 

if 


X 

y 

x y 

it 

<N 

><; 

m a 

m - 

m 3 

n 

7 

7 

tt? 

n 

). 

(4X 3 ) 



IS-fx 3 - 1 ) 2 = 


15 . VZ 


3 c - 


17. 


-4 ^ - 

m /??' 


18. 



19. 


-if - 

x x 


2 - 


20 . 


3 x 3 (c - ci) 3 
~x 2 (c - <i) 2 


(Go right on to Part III) 


X 
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A BACKGROUND TEST FOR ELEMENTARY STATISTICS 

PARTS III, IV, V 

me Institution _ 

te 

INSTRUCTIONS TO ADMINISTRATOR OF TEST: 

After 35 minutes working time announce, "Even 
if you have not finished Part HI start on Part ]V„" 

After another 2 minutes announce, "Even if you 
have not finished Part IV start on Part V. " 

After another 15 minutes, or a total working 
time of 52 minutes, collect all booklets. 


NO. RIGHT 


(NO. RT.) X tWT. ) 



CRUDE TOTAL PARTS III, IV, AND V 


PARTS III, TV, AND V SCORE FROM 
TABLE BELOW 


EQU 1 VALENT 
SCORE 
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PART III. ALGEBRA 

i« Add 5x + 8 and 3x + q y - 3. 

2. Add 4x 2 - 3x +xy - 4- and 3x 2 + 6x - y 2 - 2x 2 - 2* 

3- Subtract 4x 4 - 5x 2 + 3x - 5 from x 5 + 9x 4 - 2x + 3. 

4. Multiply 4x 2 by 7x» 

5. Multiply 4x 2 + 3 by “4xy. 

6. Multiply 4 x 2 - 3x - 5 by 2x - 4. 

7. Expand (m - n) 2 . 

8- Factor m 2 - n 2 . 

9. Write from memory the first 4 terms of (>n + n) a . (Do 
not derive) 

10. Solve the equation 6x - 12 = 0. 

11. What are the roots of x 2 - 6x + 8 - 0? 

12. What are the roots of 6x 2 + 2x - 20 = 0? 

13* The number of real or imaginary roots of 5x^ + 3x 2 +x 
~ 4 = 0 is 
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14. Given the formula t ~. What value will t have 

4 h 

if a = 8 and 6 = 2? 


15. w = I 


SSP 2 


K(K 2 - 


• If CEP ) - 30, and K - 5, solve 


for w to two decimal places. 


16. D = __ . If o = 25, A = 170, G. = 4, G = 84, 

\j n * 

0. + — 

1 H 

solve for tf. 


17. The two sides of a right triangle are 8 and 1.8. 
What is the length of the hypotenuse? 


18. The quotient of two numbers is y and divisor is n. 
Represent the dividend. 


19. ax 2 + bx + c - 0 is a typical quadratic equation, 
the solution of which is given by 

- h jf V h 2 - 4ac 

x = — 

2 a 

Express the roots of 5x 2 + 2x - 4 = 0 by means of 
this formula, but do not perform the indicated 
arithmetic computation. 

20. Solve simultaneously 2 x - 3 y ~ 7 

3 x + y = 5 


| 


(Go right on to Part TV) 
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PART IV. TRIGONOMETRY 



!« What is the sine of angle X in the 
adjacent figure? 


2, The cosine of angle Y?„ 


8- The tangent of angle Y? 


1 4. What is the numerical value of the 
tangent of 45 degrees? 


5. The cosine of 0 degrees is 


6. What is the greatest possible value 
of the sine of an angle? 


(Go right on to Part V) 
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PART V. ANALYTIC GEOMETRY 

If the quadrants made by perpendicular coordinate 
axes are numbered as in the diagram, indicate as 
result of inspection, in the column headed quadrant, 
the number of the quadrant in which each of the 
following points lies* 


X 

y 

quadrant 

-2 

s 


-4 

~5 


2 

3 


9 

-3 


5 

~6 


n 

L 

-7 



A straight line is represented by an equation in x 
and y of the degree. 

The line 5x + 4y = 12 intercepts the Y axis at what 


The slope of the 1 ine y - 5x - 3 is 

The slope of the line 5x - 3 y - 4 is 

The slope of the line passing through the point 
( x = 6, y - 7) and the point (x = 2, y = 5) is 


4F + bX - “10 is the equation of a line. Shifting 
axes 2 units to the right and 2 units up, and using 
x and y to designate the new variables, what then 
is the equation of the 1 i ne 4 ? 
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section 2 . SCORING 'KEYS 
A BACKGROUND TEST FOR ELEMENTARY STATISTICS 
PARTS I, II 


PART I: COMPUTATION 

EACH ITEM WEIGHTED 1 


PART II: EXPONENTS 
EACH ITEM WEIGHTED ! 

i. 

798.5229 

1. 

4 5 or 1024 

2. 

60.956 

2. 

x 6 

3. 

-9. 1209 

3. 

-X 8 

4. 

1.5242 

4. 

2xV 

5. 

4.598 

5. 

x 2 

6. 

4.537 16 

6. 

x3y* 

7. 

52.4258 or 62.41 

7. 

x <y + 2) 

8. 

-16.3248 

8. 


9. 

-.01628 or -.0 162 

9. 

or 1 


or -.016 


■T, 2 ^ 

10. 

.635 or .634 or .633 

10. 

1 S x 6 or (4) 2 x 6 

1 1. 

.030 or .03 or .0304 
or .0305 

II. 

x 10 

12. 

.395 or .394 

12. 

y 3 or + y 3 

13. 

.024 or .0241 

13. 

x c or + x c 

14. 

76. or 76 or 

U. 

x 2 y 4 or x 2 Vy 


75.00 or 75.0 


or + either 

15. 

.022 or .02214 

15. 

x 2 *- 2 or x 2 la_1) 

IS. 

13 or I3i 

16. 

z 3 

17. 

81 

17. 

m 

18. 

85. (-credit for 85. 80) 

2 

18. 

n 10 

19. 

5. c red it for 4.7632) 

19. 

, 1 
x- 6 or — 

X . 


20 . 1 0 20 . 


-3x (c-d) or -3xc + 3xd 


SCORING KEYS 
SCORING KEY 

A BACKGROUND TEST FOR ELEMENTARY STATISTICS 
PART III 

PART 111: ALSEBRA 
EACH ITEM WEIGHTED 1 

1. 8x + 4y + 5 

2. 5x 2 + 3x + xy - y 2 - 6 

3. x 5 + 5X 1 *. + 5x 2 5x + 8 

4. 28 x 3 

5. -1 6x 3 y -12 xy 

5. 8x 3 - 22x 2 + 2x + 20 

7 . m 2 - rm + n 2 

8. (m - nXm + n) 

9. n? a + am 1 a ~ 11 n + a — 0 «-2« n a 

2 (or [2_ or 2!) 

afa-lXa-2) ta _,, g ' 

6 (or [_3_ or 31) 


10. 2 

11. 2, 4 

5 

12. - 3 ! (or I 2/3), “2 

13. 3 

14 . 8 

“4 

15. “.36, or || 

16 . 30 

17. 8.2 or V 67.24 

18. ny 

-2+/84 -2 +vT~+ 80 



20 . 


2 , 
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SCORING KEY 

A BACKGROUND TEST FOR ELEMENTARY STATISTICS 
PARTS IV, V 

PART lv: TRIGONOMETRY 
EACH ITEM WEIGHTED 1/3 

X 



X 


4. I 

5. I 

6. I 

PART V: ANALYTIC GEOMETRY 

1 - 6 . 2 
3 

I (each weighted I / 6) 
ti- 
ll 
3 

7. 1st or linear (weight 1/3) 

8. 3 or y = 3 (weight 2/3) 

9. 5 (weight l) 

10 . -|- (weight I) 

11. .bor\ (weight 2) 

12. 4y+ 5x = -28 (weight 4) 


"’X 
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section 3. PERCENTILE NORMS AND RELIABILITY 

Percentile equivalents are given in Table 
Appendix A for the Experience Score and also for 
the Total Background Test Score, based on 406 
students beginning work in statistics from Colum- 
bia University Teachers College, University of 
Illinois, Harvard University, University of 
Minnesota, University of Oregon, and Stanford 
University. Percentiles for the Parts separately 
are not provided, but the "equivalent” scores 
are such that a person 1 s Part I and II score 
will equal his Part III, IV, and V score i f he 
is strictly typical or equally able in the sub- 
ject matter of these different parts. In this 
table, X b is the Background Test Score andX e the 
Experience Score. The percentiles are based upon 
406 students beginning elementary statistics in 
courses at ( a) Columbia University (Teachers 
College, /V = 52), (h) University of Illinois 
(/V = 39), (c) Harvard University (Graduate School 
of Education (/V = 29), (d) University of Minne- 
sota (/V = 88), ( e) University of Oregon (^ = 153), 
( f) Stanford University (N - 45). 

The standard error of X g as a measure of 
whatever it is a measure is unknown because a 
second similar measure is not available nor can 
the elements entering into X g be divided into two 
comparable halves and reliability determined by 
the Spearman -Brown formula. The standard error 

of X b when taken as evidence of X b is 1.9. 
This is equivalent to a reliability coefficient 
of .80 in the group of 406, having a standard 
deviation of 4.40. The correlation between X e 
and X b for the 406 cases is .65. Certain small 
samples suggest that the correlations between a 
term grade in an elementary course in statistics 
and X and X b scores, received four months 
earlier, are in the neighborhood of .4 and .6 
respectively. 
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TABLE Appx. A. PERCENTILES 


PERCEN- 

TILES 

*b 

BACKGROUND 
TEST SCORE 

EXPER! ENCE 
SCORE 

PERCEN- 
T 1 LES 

BACKGROUND 
TEST SCORE 

EXP ER 1 ENCE 
SCORE 

P 

.01 

11 

22 

P.55 

27 

32 

P 

.02 

12 

23 

P .6o 

28 

32 

P o 5 

14 

24 

P .6 5 

30 

33 

P .io 

16 

24- 

P. 70 

31 

34 

p .i 5 

17 

25 

p 75 

32 

34 . 

P 20 

18 

26 

P. 80 

34 

35 

P» 25 

19 

26 

P 85 

37 

36 

P 

.30 

2 I 

27 

P.90 

39 

37 

P .3 5 ' 

22 

28 

P.95 

42 

38 

P 

. 40 

24 

29 

P.98 

45 

40 

P 45 

24 

30 

P 99 

46 

41 

P 5 o 

25 

1 

31 





APPENDIX B 

REFERENCE LISTS 

section l. LIST OF COMMON STATISTICAL SYMBOLS 

Herein are given the more common meanings at- 
tached to symbols as used in this text, and also 
the more common meanings as used in statistical 
texts in general. 

Throughout, as used in this text, a bar over 
a symbol, e.g., V, indicates an unbiased estimate 
or a mean value, and a tilde, e.g., V, a popula- 
tion, hypothetical, or true value. 

THE GREEK ALPHABET 


A 

a 

Alpha 

i 

t 

lota 

p 

P 

Rho 

B 

p 

Beta 

K 

K 

Kappa 

i 

cr 

Sigma 

r 

7 

Gamma 

A 

k 

Lambda 

T 

T 

Tau 

A 

8 

Delta 

M 

P 

Mu 

T 

u 

Upsilon 

E 

6 

Epsi Ion 

N 

V 

Nu 

$ 

<p 

Phi 

Z 

i 

Zeta 

8 

s 

Xi 

X 

X 

Chi 

H 

V 

Eta 

0 

0 

Dm S cron 

-*• 

'P 

Psi 

0 

6 

Theta 

n 

77 

Pi 

n 

CO 

Omega 
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LIST OF COMMON STATISTICAL SYMBOLS - LITERAL SYMBOLS 
a, A, and a 

a, - the constant term in a regression equation. 

a,- a and b are employed in connection with 
boundary values of a seauential analysis 
likelihood ratio. 

a . . f - an element in a matrix at the inter- 
section of the i- th row and the j-th column, 
Ch. XIV, Sec. 2. [14:01] 

A,- Arbitrary Origin, as employed by Yule and 
Kendall. 

A..,- a cofactor. Ch. XIV, Sec. 2. [14:12] 

G, - common notation for a matrix. Ch. XIV, 
Sec. 2. [14:01] 

G' t - a matrix that is the transpose of CL 
[14:02] 

Or 1 ,- a matrix that is the inverse of A. 

[14:06] 

a,- the proportion of cases in a cell in a 
2X2- fold. The other proportions are /?, y, 
and S. 

a, - in Pearson* s Tables , the area between mean 
and point of dichotomy in a unit normal 
distribution = a/2. 

a, - a measure of risk in sequential analysis. 
6, 3, and /3 

b, - a regression coefficient. 

b f - see a (sequential analysis). 

h. an abridged notation for 6 0 1 . 12 . . . M ( . . . k • 

A regression coefficient in a multiple re- 
gression equation. 

/?, - a standard score regression coefficient. 
/3,- a measure of risk in sequential analysis. 


LITERAL SYMBOLS 


679 


/3| , - an abridged notation for & 0 i . 12 . . . m i . . . k • 

A regression coefficient in a multiple 
standard score regression equation. 

3(p>q),- The Beta Function. See Table XV A. 
1934-1938 3iom. 

& x(p*-Q) >- The incomplete Beta Function. See 
Table XV A. 1934-19 38 Biom. 

See I x (p,q). 

“ Pj/ p\ » & 2 /^4-Z P'2 * ^3 ” P^Ptjf * ^ 4 ~~P^/ P 2 . 

= fjLjjj,-]/ /z|; /3 6 =u 8 /^ 2 » in which the /Fs are 

moments from the mean, and [i 2 ~ V as used 
in this text. 

These /3* s are used especially in connec- 
tion with the Pearson system of curves. 

C (for 7 and F see for ^ see end of al- 

phabet) 

c, - as a preceding subscript indicates a cor- 
rection for coarseness of grouping. 

c.j,- the covariance, or product moment, be- 
tween the i and j variables. 

c_ 2 , c_ 1# c 0 ,c 1 ,c 2 , c 3 , c 4 ,- Lagrangian 
interpolation coefficients, 
a contingency coefficient: C 2 =0 2 /(l+0 2 ) 

= ^ 2 /(^v + 

C. . , - a cofactor. 

C. . , - is used in this text to mean 2X.X./N. 
See c . ]m 

cm,- as a preceding subscript indicates a cor- 
rection for use of class means. 

C(n,m) , or n C m , or F, or is the number 

of combinations of n things m at a time. 
C(n,ui) = m\/ [m\ (n~m ) ! ] . See P(n,m) . 
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d, D, 8, and A 

d, - a deviation. 

of, - a difference in scores, in ranks, etc. 

of, - (and also d, j. ) the mean deviation of a por- 
tion of a unit normal distribution. [8:26] 
and [8:27] 

of..,- a difference between standard scores z. 
and z . . [11: 28] 

d * , - in actuarial statistics is the number 
dying during the x year of life. 

d.o.f. the number of degrees of freedom. 

D, - a difference in rank when computing the 
correlation between ranks. 

S , - See first definition of a. 8 is value ta- 
bled in Pearson' s Tables for different 
tetrachoric coefficients. 


8 and A frequently indicate small increments. 
As they approach infinitesimals they ap- 
proach the differentials of calculus. 


' J 


, - Kronecker ’ s 8.. is such a quantity that 
it = 1 when the two subscripts are the same 
and otherwise it ~ 0. If the a 9 s in the 
matrix 


a ll 

3 1 2 • 

' a 1 k 

a 21 

a 2 2 - 

' a 2 k 

• 

• 

• 

a kl 

a k 2 • 

1 * 3 kk 


are such that the sum of their squares for 
any row, or column, “1, and the sum of the 
products for corresponding members for any 
two columns, = 0, then 
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A as a subscript means "dependent upon." It 
is the opposite of the dot as a subscript, 
which means independent of." It is em- 
ployed herein in connection with scores, 
correlation coefficients, regression coef- 
ficients, variances, covariances, etc. 

A,- a major determinant. A; . is the minor de- 
terminant after row i and column j have 
been deleted from A. Ch. XIV, Sec. 2, 
[14:16]. 

A,- with superscripts is used to indicate dif- 
ferences in tabled values. Ch. XIII, Sec. 
12. Table XIII I. 

e, E , and 6 

e, - the Naperian base of logarithms. It r 

2.71828'.' 18284 59045, etc. 

exp,- in an eauation, the expression following 
"exp" is the exponent of e. 

- the expected value, the expectancy, or the 
mean value. 

6 , - a small quantity. 

an unbiased correlation ratio. 

f , F, 0, and <3? 

- a frequency, or number of cases. 

as a preceding subscript indicates a cor- 
rection for fineness of grouping. 

f(x) , - a function of x. 

f'(x) , - the first derivative of f(x) thus, 
if y = f(x), then dy/dx = f‘(x); f ", - the 
second derivative, etc. 

{.. = 3/fT:. Ch. IX, Sec. 5, [9:22]. 

f . . ™ the frequency in the cell at the inter- 

J section of the i row and the j column. 
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F, - when computing percentiles, F - the sum of 
f r s up to a certain point. 

F. . , - a variance ratio having i d.o.f. in the 

numerator and j d.o.f. in the denominator. 
As herein used, the denominator variance 
is the error variance. An alternative and 
common usage is so to write F. . that it > 1. 

0, - a common designation for an angle. 

0,* <p(x) a function of x. 

O, - Yule’s 0 is a product-moment correlation 

coefficient in a 2X2- fold. 

0 2 ,- isthemean square contingency. 

This 4> 2 = Yule’s 0 squared in case of a 
2X2-fold only. 

G, y, and P 

g*,- as used by Fisher, g 1 “ and “ 

k^/kl. These are sample estimates of his 
population parameter, y ± and y 2 . 

integral. See Ch. XV, Sec. 1. Pearson, 

(1914). 

y, - Fisher’s y, = ^ , and y 2 = 

As used by Fisher, Greek letters indicate 
population, not sample values. 

P, - the gamma function, - a concept o f factorial 

extended to numbers other than integers. 
See I ttf . 

h, tf, and rj 
h f - see k . 

A,- h x and are intercepts in seauential 
analysis. 

hypothesis. H with follov/ing notation 
commonly means under the hypothesis indi- 
cated by the following notation. 
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77 ,- a correlation ratio. See e. 
i, I, and l 

i,- the imaginary cmantity /-l. This univer- 
sal use of i has not been called for in 
this text, 

i, - the size of the grouping interval. 

i or r , - an interest rate. 

i and j as subscripts of K and in other con- 
nections, are used herein to indicate the 
numbers of d.o.f. herein i' and;', as sub- 
scripts of V, indicate the variances of the 
i and j variables, — the prime being used 
to avoid conflict with the use of i and j 
to indicate d.o. f. 

i as a subscript of a variable, or connected 
with a summation sign, stands for any of 
a number of variables, usually from 1 to 
k. j has a similar meaning for a second 
series of variables, usually from a to 

I,- the area from the mean to the noint x in 
the unit normal distribution. I of Table 
XV C ” a/2 of Sheppard's table, given in 
Pearson's Tables. 

I,* the identity matrix: AA~ x r I. Ch. XIV, 
Sec. 2, [14:05]. 

of Pearson's Tables of the Incomplete 
Gamma Function ~ T x (p+1) / r(p + l) . 

I y (p,q) of Pearson's Tables of the Incomplete 
Seta Function = 3 x (p,p) / 8(p,cr ) . 

j and J 

j , - see i- 

/ m f x) , - a common designation of a Bessel func- 
tion of the first kind. 
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k, and k 

k, - is a coeffi cient of alienation. It “ /\~r 2 . 

k s - is the number of classes in a categorical 
series. Also the specific designation of 
the k - th class, 't is used with a similar 
meaning if a second series is involved. 

k ± , k 2 » k^ , ic 4 , as used by Fisher, are unbiased 
sample estimates of cumulants k 19 k 29 , 

and 

k , - Fisher 7 s cumulants are population statis- 
tics. k i - M ; k 2 z: 7; + 3k 2 - /x 4 

in which the tilde measures are as used in 
this text. 

L, k, and A 

a direction cosine, usually in n-dimen~ 
sional space, 

see the second definition of /c. 

, - in actuarial statistics is the number of 
persons living at age x. 

L, ~ a likelihood ratio, 

I o.j£ f - also i°^ 10 »* logarithm to the base 10. 
in, -also logarithm to the base e. 

A, - a direction cosine, usually in n-dimen- 
sional space. 

A.,- a Lagrange multiplier. [14:118] 

A,- in the expression "A-shaped" indicates a 
unimodal distribution. 

m, M , and i± 

M , - the arithmetic mean. 

a moment from the mean: = [lx/N) =0); 

=■ (Ix 2 /V) = V = a 2 ; fi 3 = (IxVV); ^ = 
(ScV-V.); etc. 
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At # >- a moment from zero: jxl, -SX 2 /Y; 

^ = 2 YV/V; ^ =ar 4 /ff;%tc. . 

n , A , and ^ 

n »- d.o. f . , i . e. , n is synonymous with i as 
used herein. 

n,- a common designation o f the last item in a 
series,- i . e. , n is synonymous with -k as 
used herein. 

n f - a common designation of the exponent of a 
binomial , or polynomial. 

n,- number of cases in a sample (though /V is 
more frequently so used). 

n , - average sample number. 

A , - the number of cases in a sample. 

v,- Greek nu, sometimes used with the meaning 
of ji‘ preceding. 

o and 0 

o , - as a subscript in connection with index 

numbers, it commonly refers to the basal 
date. 

o , - as a subscript in H‘ Q indicates the null 

hypothesis. 

o , - a common designation o f the criterion vari - 
able. 

OC, - operating characteristic curve, 
p, P, 77 , and II 

p, - a probability, i.e. , a proportion of cases. 

If but two classes, Q is the other propor- 
tion, so that p + q = 1. If k classes, + 
p- + . . . +.p k = 1. Some writers employ 

p +a+r + s =l, in lieu of p x +p 2 +p 3 +p t , = 1. 

p, - a price index. 
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p,- in interrelation, the proportionate dis- 
tance of the value in auestion from the ta- 
bled argument next smaller to the distance 
between the two neighboring arguments. 

p, - in a dichotomized unit normal distribution, 
p, as herein used, is the proportion to the 
left and Q that to the right of the point 
of dichotomy. As tabledinthis text, P>Q . 

p.. # - a product-moment, usually involving de- 
viations from means. Frequently p 5 ; is 

defined as is c. . herein. Another usage 
defines p-j k | as = I in which 

the exponents may take any positive inte- 
gral values, including that of zero. Karl 

Pearson has used p. j kj to mean p ? . k , , as 
here defined. 

p.j,- a price ratio, — the price at date i di- 
vided by the price at date j. See Qj . . 

p x , - in actuarial statistics, the probability 
of a person of age x living one year. 

P , - a probability, usually of a divergence as 
great in absolute value as that observed. 
With one d.o.f. and a normal distribution , 
P - 2 <7, as given in the first definition 
under p. 

P, - a price index. 

P . j , - a product moment. Sometimes P . . is de- 
fined as C f . herein. As herein used, P ; - ki 
= IX^X^X^X^/N. Karl Pearson uses . k ( to 
mean P . . k 1 , , as here defined . 

P(n,m) , or n P m , or PJJ , — is the number of per- 
mutations of n things m at a time. P(n,m) 
~ n! /(n-m)! . See C(n,r . 
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P p »- a percentile, — P p is the ( IOOd) percentile. 

7T,» the ratio of circumference of circle to 
diameter. It = 3.14159 26535 89793 etc. 

n, - as a symbol of operation means the product 
of all those magnitudes immediately fol- 
lowing, thus n$ x. -X, x x 2 x . .. x i k , 

Q > 0 

<7,- see first and third definitions of p. 

c. . a Quantity ratio, — the quantity at date 
i divided by the quantity at date j . See 

^ i i 

Q.j,- the area in a unit normal distribution 
between x. and x. . 

9 X , - in actuarial statistics is the probability 
of a person of age x dying within one year. 
q ~ d / 't . 

<?, - a quantity index. 

Q, - a quotient. 

P, - a Lexian ratio, which, when i ~ d.o.f. , = 
\ 2 /i - the variance ratio F.^ . 
r, R, and p 

r,- a product-moment correlation coefficient. 

r , - with preceding and following subscripts, 
other than those designating variables, a 
correlation coefficient with various cor- 
rections and under special conditions. 

r,- a ratio. 

r, or i , - an interest rate. 

r 12 , - may be called a total correlation coef- 
ficient to distinguish it from partial cor- 
relation coefficients. 

r i 2 . V r i 2 . 34 ' r i?. 34 V etc. , arepartial cor- 
relation coef fi cients. 



688 


REFERENCE LISTS 


r iA23* r iA 2 3 4 ' etc. , as herein used, designate 
multiple correlation coefficients. See Ch. 
XI, Sec. 3 . 

r lm2 y r i - 2 3 4 ’ etc., have been used to desig- 
nate multiple correlation coefficients. 
Such usage is avoided in this text for 
reasons given in Ch. XI, Sec. 3 . 

r i t 23 ) * r u 2 3 4 ) * etc.,- an alternative nota- 
tion to 23^ 1 At 23ij,#et C-. 

( 23 j » ^ n 23 4 } i etc. , are an alternative no- 
tation for r 2 3 > 341 etc. 

p, - designates the correlation coefficient, 
based upon the squares of differences in 
rank. 

S, o', and 2 

s, - R. A. Fisher uses s to indicate an unbiased 
estimate of cr , which, in his usage, is a 

population value. Thus s 2 - N/(N~\) X F 
as used herein. 

s, - as a preceding subscript to r , or in- 
dicates a correction for shrinkage. 

s , - the indifferent value of a proportion, or 
of a mean, which is also the slope of 
boundary lines in sequential analysis. 

S,- as a symbol of operation, indicates a sum- 
mation. As used herein, I. is a summation 
with reference to /V, the cases in the sam- 
ple, and S a summation with reference to 
the number of classes, or the like. 

or 9 - the standard deviation, which equals the 
square root of the variance. As used by 
R. A. Fisher, cr is a population, not a sam- 
ple, value. 
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as a symbol of operation it indicates a 
summation. See S; In this text I indicates 
a summation of N terms unless specially 

N ( N ~ 1 ) 

noted, thus 2 is a summation of N(N~1) 

k 

terms, and 2 is a summation as i takes 
i =i 

all values from 1 to k m This is also com- 
monly written 2^. In this text the still 
more abridged notation 2 k is sometimes used. 
Great care is necessary with double and 

u v 

triple summations. 2 2 A 1 . A 0 . a „ . a, . 

i=i j =i 11 z J z 1 

is a summation of uv terms in which 1 and 
2 do not vary and i takes all values from 
1 to u simultaneously in A ± . and a 2 . , and 
for each value of i all values of j from 
1 to v (including the case in which j “ i ) 
simultaneously in A 2 j and a 2 j. If case 
i = j is to be excluded, it must be indi- 
cated, usually by notation at the right of 
the equation, j t i . Another common situa- 
tion is when j takes all values less than 
i, in which case the notation j<i is re- 
corded. In this case a double summation 

u u - 1 

2 2 , i<i, would have t*(u~l)/2 terms. 

j = i 

Summations are sometimes indicated in math- 
ematical treatises ( seldom in statistical) 
by the use of a dummy index, subscript or 
superscript. When this index appears twi ce 
in an expression, it represents a summa- 

tion, thus a.. fa ;k =a 1] a Jtk +a 2j a. 2k + ... 
+ a nj . a nk , in which 1, 2, ... n are all the 

possible values that i can take. 
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2,- the standard deviation. When two standard 
deviations, not readily definable by dif- 
ferent subscripts, are involved, a and 2 
may be used. E. g. , cr designates a stand- 
ard deviation in a narrow range sample and 
2 designates it in a wide range sample. 

t , T, r, 9 , and 0 

t , - "Student's" t is a critical ratio, — a de- 
viation in terms o f its standard deviation. 
The £ -distribution is symmetrical, but non- 
normal, because, and only because, the 
number of cases in the sample is small. 

t 2 = F 1 .,—see F . . . 

t , - a tabled entry. Ch. XIII, Sec. 12, Table 

XIII I. 

9, - a common designation for an angle. 

9 r ,- as used herein, 9. is the function de- 
pending on i, the d.o.f., involved in nor- 
malizing a variance ratio. See Ch. IX, 
Sec. 5. [9:23] 

u, U , v y and 'T 

u, - commonly used to indicate a tabled entry. — * 

see n t\ 

v and V 

v, - in connection with percentiles, v is the 

value of the lower limit of the interval 
in which the ( lOOp ) percentile lies. 

V, - the variance: V 1 is the variance of the 
first variable, V 2 that of the second, etc. 
For exception, see V. and V. . 

V. and Vj , - as here used in connection with 

variance ratios, V. indicates a variance 

vased upon i d.o.f. and V. one based upon 
j d.o.f. 


x 
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^5 j »“ in- connection with a normal distribution, 
is the variance of the portion of a unit 
normal distribution lying between x ? and 
*j. See [ 8 : 29] . 

w and V 

w > - designates a weight, e.g. , w 2 , w 3 may 

designate the weights attaching to the 
first, second, and third variables. 

x , X, £ and 

x,- a score as a deviation from the mean. 

x,* not infrequently a standard score, as in 
Table XV C. 

x,- not infrequently a raw r score, i . e. , it ~X 
as herein used. 

X,- a raw score. 

X,- the mean of the X 1 s. 

<?,- a score as a deviation from an arbitrary 
origin. 

y and y 

Same meanings for y, V, and £, but pertaining 

to a second variable, as given for x, X , and g 

as pertaining to a first variable. 

Y m ( x) , - a common desi gnation of a Bessel func- 
tion of the second kind. 

z t Z f and £ 

z,- not infrequently a standard score. 

z, - the ordinate in a uni t normal distribution. 

z, - R. A. Fisher's z. See Ch. XV, Sec. 1, 
1938, Fisher and Yates. 

see n y B . 

and 


0 , - de fined under f. 
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X 

X, - the square root o£ X 2 '. , though fre- 

quently derivable from categorical data 
and frequencies in classes, is, with vary- 
ing degrees of closeness, the sum of a num- 
ber of independent squared deviates of a 
unit, normal distribution. Q 2 /i) =F. , — 
see r . . . 

As herein used, i designates the number of 
degrees of freedom in 

co and Q 

co,- as a subscript is used in connection with 
corrections for attenuation to indicate a 
true score in the second variable, co in- 
dicates the true score in the first vari- 
able. 


HA 
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section 2 . CERTAIN LESS COMMON NON-LITERAL SYMBOLS 

or , is approximately equal to* 

- is asymptotically equal to. Frequently, in 
statistical equations, the nearness to equal- 
ity increases as /V increases. 

= G = , - is equivalent to, in terms of the designated 
function relationship; thus if two achievement 
tests are equated on the basis of age norms, 
we might write 1 = 78 =0= F = 152, meaning that 
78 is the same age norm on test X as is 152 
on test F. 

00 , - varies as. 

>,- is greater than. 

<,- is less than. 

- is greater than, or in the limiting case, is 
equal to, 

, - is less than, or in the limiting case, equal 
to. 

|x|,- the absolute value of x, or the value re- 
gardless of sign. 

x! , or [x, - factorial x. If used when x is not 
an integer, it can then represent r(x-fl), but 
if so used, a specific statement in the ac- 
companying text is necessary. 

rx t - see under Greek letter gamma. 

4*3,- the product of A and 3. Also, in dealing 
with vectors, this is the "dot product, " or 
scalar product of A and 8, 



the determinant whose element in the i-th 
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|| a.. |!,- the matrix whose element in the ith row 
and jth column is a... 

f(x) 3 = f(b) ~ f(a) 

a 


I abc. 


is the determinant 


a i a 2 a 3 
b l b 2 b 3 
C l C 2 c 3 


abc. .. || is the matrix with the same elements as 
in the determinant I abc. . ,L 


1TTT) = ( a + b) 


oo,- as a subscript in connection with corrections 
for attenuation, is used to indicate a score 
in the first variable having no chance error. 
co is similarly used with reference to the 
second variable. 

- approaches, thus n-oo is read, "as n approaches 
infinity." 

) i ( , - reversed parentheses in a sequence written 
1, 2. . . )i( . . . ,/c asserts that all the values, 
1, 2, etc., to k, except the value i, are in- 
cluded, E.g., should i have the specific 
value 2, the sequence would be 1,3,4,. .. >k . 
Where no ambiguity results, the commas may be 
omitted and this last series written 134. . .k. 


[ ] , - not infrequently square brackets are used to 
designate the mean of whatever is within the 
brackets. E.g., for a sample of N, [X] - IX/ 
N -M* [ x ± x 2 ] - {Tx 1 x 2 )/N “ c 12 . Textual ex- 

planation of this use of [ ] is necessary, for 
otherwise it may be interpreted as a symbo 1 
of aggregation. 
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section 3. A FEW MATHEMATICAL TERMS (not elsewhere 
defined) COMMONLY HAVING STATISTICAL SIGNIFICANCE 


Affine ,- an affine transformation is of the form 
y 1 - a x ± +bx 0 +c 
y 2 = A x 1 + B x 2 + C 


When 


a b 
A B 


- (a B ~ A b) * 0, 


this covers the 


fundamental types of general utility in sta- 
tistics: translations, rotations, stretchings 
and shrinkings, and reflections. The affine 
transformation carries parallel lines into 
parallel lines and finite points into finite 
points. The rotation is an isogonal affine 
transformation in that it does not change the 
size of angles. 


Bernoulli an numbers , - they are: 






69 I 7 

; B - 

2730 7 6 


_ 3617 
8 ® ” 510 ’ 



( 2n)j | 

i 2 2n "' 1 77 2 n 1=1 i 


Combination,- the number of possible combinations 
of n different things m at a time i s desi gna ted 
C(n,m), or n C m , or C n m , or {*) . C(n,m) = /V ! / 

( n-m ) ! m! See Permutation . 

Factor,- a factor as used in mental factor analy- 
sis, is a linear function of the scores upon 
a number of measures (usually mental tests) 
which, in relation to other factors, i.e., 
other linear functions of the same scores, has 
certain unique properties. Different con- 
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siderations lead to different factorial solu- 
tions. Involved are concepts of (a) indepen- 
dence, or thogonali ty and maximal variance (Ho- 
telling, Kelley, et al . ) , (h) simple structure 
(Thurstone, et al.), ( c) social importance 

(Kelley), and various hypotheses, general 
(Spearman, et al.), group (Thomson, et al . ) , 
bi-factor (Holzinger, etal.)etc., which qual- 
ify the conditions which are imposed. 

Geometric series,- one in which the ratio of each 
term to the following is constant. Let the geo- 
metric series be a + ar + ar 2 + ...+ ar n . The 

a( ] - r n +I ) 

sum of these (n+1) terms is — — } ~- 

l - r 

Integral function , - one which can be so written 
that the variable, or variables, have positive 
exponents and do not appear in the denominator 
of any term. 

Integration i s the process o f summing parts (the 
infinitesimal elements of area of the calculus) 
to obtain a whole. An integral is the result 
o f an integration. A definite integral is the 
result of integration between definite, or 
assigned, values. 

L exian ratio designated Q by Lexis - ^ Vd.o. f. 

Logarithm , - if given y - a*, then x is the logs* 
rithm of y to the base a* Two bases are in 
common use, — the base e and the base 10 , yield- 
ing natural logarithms designated / fcn, orlog e , 
and common logarithms designated log, or log 10 - 

With base e we have y - e* and the important 
property " e x holds. With base 10 we have 
y ~ 10* and the important property log (10 p y) 
" p + log y holds, thus log 172. =2 + log 1.72 
“ 2.235528... in which 2 is called the charac- 


MATHEMATICAL TERMS 697 

teristic and .235528... the mantissa, p is a 
positive or negative integer. 

The relationship between natural (or Napierian) 
and common (or Briggs) logarithms: 

log x = .43429 44819 03252 ... fa x 
fa x = 2.30258 50929 94046 ... log x 

0 rthogonal Transformation ,- one which transforms 
from one set of rectangular coordinates to a 
second set. E.g. , simple rotations and trans- 
lations. 

Permutation,- the permutations of n different 
things m at a time consistofall possible com- 
binations m at a time and for those in each 
combination all possible arrangements. The 
number of such is designated P(n,m), or n P m , 
or P". P(n,m) = nl/(n~m)[ See combination. 

Quadrature ,- the process of finding a rectangular 
area equal to that of a given surface. Usually 
one only of the bounds of this surface is 
curved. 

Ratio test for convergence o f an infinite series, - 
if the absolute value of the ratio of the 
(n-KL)th term to the nth term as n-* co is less 
than 1, the series is convergent; if equal to 
1, there is no test; and if greater than 1, 
the series is divergent. 

Rational function one which can be written so 
that the variable, or variables, do not have 
fractional exponents, or equivalent radical 
fo rm. 

Rational integral function,- one which can be 
so written as to be both rational and integral. 

Vector,- a line having both magnitude .and direc- 
tion. 
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PAGE 

Re percenti les: 


146 

p> p P > v f p > 
v and F P 

P 

p 

[4:01] 

14-6 

i' and /' 

p p 


I 49 

crp an d q 

[4: 03] 

14-9 

p 

A generalized mean 

[6:01] 

201 

K 

[6: 03] 

203 


[6:07] 

204 

V 

[6:04] 

203 


[6:05] 

203 


[6:46] 

217 

cr 

[6:06] 

203 


[6: 44] 

217 

V . 

[6:09] 

204 

V and V 

[6: 10] 

205 

*5 

M 

[6: II] 

206 

''V/ 

X 

[6: 1 4] 

208 


[6: 15] 

1 09 

[5: IS] 

209 


[6: IS] 

209 

V( M 1 - M 2 ) 

[6: 19] 

210 

Moment from fixed point 

[6:20] 

210 
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[6:21] 

2 i i 

1 

[6: 22] 

213 

¥ 

[6: 23] 

213 

Fisher's notation: 


212 

nn[, m 2 , niy m } , 



f -^ 2 . l ~^ 2 * ^3 1 [~L ^ 



Yule' and Kendall's 


212 

notation: 4 and £ 



M 3 

[5:211] 

213 

r 

[6: 25] 

213 

V 

[6: 10] 

213 

¥ 

[6: 26] 

213 

h 

[6: 27] 

214 

ft* 

[6: 28] 

214 

MV 

[8: 29] 

214 

M 2 V 

[6:30] 

214 

j 

[6:31] 

214 


[6:32] 

215 

A; 

[6:33] 

215 

k^,- Fisher 

[6:34] 

215 
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Fisher 

[6:35] 

215 

fc a> ,- Fisher 

[6:36] 

215 

Fisher 

[6:37] 

215 

x,,- Fisher 

[6:38] 

216 

/< 2 »“ Fisher 

[6:39] 

216 

x,,- Fisher 

[6: 40] 

216 

x, ff - Fisher 

[6: 4l] 

216 

€ 

[ 6 : 42 ] 

217 

i,- interval 


217 

Arb. Or, 

[ 6 : 43 ] 

217 

x, - a devi ate from M 

[6: 45] 

217 


[6:47] 

217 


[6:48] 

217 

V,- Sheppard 1 s 

[6: 49] 

218 

correct ion 

S AV“ Sheppard's 

[6:50] 

218 

correct ion 

s. 

[6:51] 

221 

5 2 

[6:52] 

221 

S 3 

[ 6 : 53 ] 

221 


[6:54] 

221 

Yj , - cf . [13:147] 

[6:55] 

223 


[6:56] 

224 


[6: 57] 

224 


[6:58] 

224 
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V 

cr 

[ 6:59] 

224 


[13:77] 

524 


[13:79] 

525 


[13:81] 

525 

a a 

[ 6:60] 

224 

A; 

[ 6:61] 

224 


[ 6: 62] 

225 

V 

Mu 

[ 6:63] 

225 

. 4 

[ 6:64] 

225 

A. D. average deviation 

[ 6:55] 

227 


[ 6: 68] 

229 

A. D. from P 

[ 6:66] 

228 


[ 6:67] 

228 

h,~ number of measures 


228 

below P 

cr (A. D. from P) 

[ 6:69] 

229 

cr (A. D. } 

[ 6: 70] 

230 

c (P P p * covariance 

[ 6:71] 

231 

between percentiles 

V( P -p o 

A p p y 

i 6:72] 

231 

P , - pe rcent i 1 e measu re 

[ 6:73] 

231 

of var iabi 1 ity 

* 

v 

[ 6:74] 

231 

quart ile deviation 

[ 6:75] 

232 

F 

Q . 

[ 5: 76] 

232 

Re V, l, and Arb. Or. 

[ 7: 20] 

260 

Vo 

X 

[ 7: 21] 

260 

f, - ord inate in best 

[ 7:23] 

261 


fit- parabol a 
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[7: 24] 

26 1 

V(Mop 

[7: 27] 

262 

V,o 

[7:28] 

262 

V f 

[7: 29] 

265 

Geometric mean 

[7:30] 

266 

Harmonic mean 

[7:31] 

272 

Differential equation 

[8:0 i] 

233 

of normal curve 

z,- unit normal 

[8:02] 

288 

distribution 

[10:19] 

348 

p,- in a normal 

[8:03] 

238 

distribution 

y,~ normal distribution 

[8:04] 

289 


[8: 34] 

30! 

fi x ,- normal 

[8:06] 

290 

A general i zed mean 

[7:01] 

23 4 

^Mdn 

[7:02] 

240 

/3 2 ,- Pearson 

[7: 03] 

244 

Equation of distribution 

[7:0s] 

245 

for which q; - cr Mdn 

Ku,~ percentile measure 

[7:08] 

246 

of kurtosis 

^K U 

[7:09] 

246 

Mesokurtic Ku 

[7: II] 

246 

Pearson 

[7: 12} 

249 

g 1> - Fisher 

[7: 13] 

249 

V 

a ~ 

[7: 14] 

249 
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\ 


[7: 15] 

250 


percentile measure 
of asymmetry 

[7: 16] 

250 

V 

A c 


[7: 17] 

250 

Sfc,- 

percent i le measure 
of skewness 

[7: 18] 

250 

y Sk 


[7: 19] 

251 

Mean 

deviation,- normal 

[8:07] 

[8: 20] 

290 

292 

H-o 

*/, - normal 

[8:08] 

290 

Mean 

\ x~ j , - normal 

[8:09] 

291 

• 

normal 

[8: 10] 

291 

Mean 

\x F) \ , - no rmal 

[8: 1 1] 

291 


normal 

[8: 12] 

291 

* " 

normal 

[8: 13] 

291 

Me » " 

normal 

[8: 14] 

291 

Mio » ‘ 

- normal 

[8: 15] 

291 

A 

normal (Pearson) 

[8: 16] 

291 

AV- 

normal (Pearson) 

[8: 17] 

291 

?i'- 

normal (Fisher) 

[8: 18] 

29 1 

y?.'~ 

normal (Fisher) 

[8: 19] 

291 

<?,- , 

normal quart! le 
deviation 

[8:21] 

292 

/v,- 

normal 

[8: 22] 

292 

x (or z) , - a standard 
score 

[8: 23] 

293 
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P.E. 

[8:24 ] 


294 

y r ? - normal 

[8:24a] 


295 

mean deviation of 

[8:25 ] 


296 

normal tail portion 

[8:26 ] 


296 

d. . mean deviation of 
normal portion 

[8: 27 ] 


297 


[8:28 ] 


298 

[8:29 ] 


298 

dp 

[8:28a] 


298 

d z 

[8: 28b] 


298 

'V.ij 

[8:31 ] 


299 

j 

[8:33 ] 


299 

normal 

[8:35 ] 


302 

mean ^ 2 

[8:36 ] 


308 

(or K.) 

[8:37 ] 


308 

[8:38 ] 


308 

F. . , - variance ratio 

[9:21 ] 


308 

F lfl0f - a function of ^ 2 

[8:39 ] 


309 

Mean class frequency 

[9:0! ] 


316 

Variance of cl ass 

[9:02 ] 


316 

frequency 




, - class frequency 

[9:03 ] 


316 

- class frequency 

[9:04 ] 


316 

Ap- class frequency 

[9:05 ] 


316 

/3 2 , « class frequency 

[9:06 ] 


316 
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PAGE 

X 2 and contingency 

[9:07] to [9: II] 

31 8 

table notation,- 



f,. X.r.. v. 

[9: 16] to [9:20] 

325 

'V/ 

d.o. f. , / t , / st 



Linear restriction 

[9: 12] to [9: 13] 

322 

F. . variance ratio 

[9:21] 

326 

f.. 

* ) 

[9:22] 

326 

0, 

i 

[9: 23] 

326 

d normal izi ng 

[9:24] 

327 

transformat ion 



x normal izing 

[9: 25] 

327 

transformat ion 



P. . and R . 

[9:26] 

329 

Additive property 

[9:27] 

331 

of } 2 >s 



x 0 

[10:01 ] 

333 


1 1 0: 0 la] 

333 

a 01 

[10:02 ] 

333 

^01 

[10:03 ] 

333 

x Q and x x (See 

[10:04 ] 

333 

also [11:97]) 

[10:04a] 

333 

z Q and z x 

[10:05 ] 

333 


[10:05a] 

333 
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Preceding subscript Table X A 334 

notation indicating 
corrections in cor- 
relation coefficients: 



X 0 quadric regression 

[10:06] 

335 

r 

[10:07] 

339 

Comparable measures 

[10:08] 

340 

cr i .2 

[10:09] 

342 


[10:24] 

350 


[10:49] 

364 

y, - bivari ate normal 

[10: 10] 

343 

distri bution 

[10:21] 

348 

A subscript notation 
(See also Chapter XI, 
Sec. 3, pp. 435-436) 


344 

analysis of variance 

[10: 1 1] 

345 

(See also [l 1:99]) 

[10: 16] 

3 46 


[10: 17] 

347 

v 2ml 

[ 10: 12] 

346 


[10: 18] 

347 


[10: 15] 

346 

*2.1 

[10:23] 

349 

CT 2.1 

[ 1 0 : 09] 

342 


[10:24] 

350 


[10:49] 

364 


x 
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PAGE 

b 2i and & 12 

[10:25] 

350 

[ 10: 26] 

350 


[10:30] 

352 


[10:31] 

352 

r 12 

[10:27] 

351 

[10:29] 

352 

c 12 ,- covariance 

[10:28] 

351 

d. o* f. equation 

[10:32] 

354 

2 X - a 1 inear 

[10:33] 

355 

restriction 

2X X , -ali near 

[10: 34] 

355 

rest rict ion 

Null hypotheses 

[10:36] 

356 

re W 0 and b Q1 

[10:37] 

356 

F 1 N _ 2 , - re the mean 

[10:38] 

356 

re the 

[10:39] 

356 

regression coefficient 

, - correlated data 

[10:40] 

357 

>'('*>„ p 

[10:41] 

357 

[13: 143] 

553 

L,,.,, re hypothesis 

[10:42] 

358 

r = 0 

z,- Fisher f s r i nto z 

[10:43] 

359 


[10:43 a] 

359 

cr 7 

[10:44] 

359 

(z-z) critical ratio 

[10:45] 

360 

CT r ,- cf. [13: 145] 

[10:46] 

360 


[10:46a] 

360 
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r 2 adjustment for 

coarseness of grouping 

V (X Q ) , - variance of 

a point on the re- 
gression line 

Difference formula for r 

Sum formula for r 

Sum and difference formula 

Mean of N ranks 

V of N ranks 

p,- rank r 

r from p 

°P 

cr.,- r from p 

r from Pearson T s p 

Horn f s correction for 
tied ranks 

Applying Horn T s correction 

f - dichotomous series 

K dichotomous series 

c y i s d ichotomous 

and x cont inuous 

h 

xy 

b 

yx 

r xy bi seri al product 

moment r 


[ 10 : 48 ] 

362 

[ 10 : 50 ] 

364 

[ 10 : 52 ] 

365 

[ 10 : 53 ] 

366 

[ 10 : 54 ] 

356 

[ 10 : 55 ] 

366 

[ 10 : 57 ] 

366 

[ 10 : 58 ] 

367 

[ 10 : 5 . 9 ] 

367 

[ 10 : 60 ] 

367 

[ 10 : 61 ] 

367 

[ 10 : 62 ] 

368 

[ 10 : 63 ] 

368 

[ 10 : 64 ] 

369 

[ 10 : 65 ] 

371 

[ 10 : 66 ] 

371 

[ 10 : 67 ] 

371 

[ 10 : 68 ] 

372 

[ 10 : 69 ] 

372 

[ 10 : 70 ] 

372 
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X 

[10:7 0 

372 

Y 

[10:72] 

372 

F i N _ 2 for the difference 

[10:73] 

372 

between means of the 
upper and lower classes 
of the dichotomy 



b xy / y' assumed 

continuous 

[10: 74] 

374 

r ; , - biserial r 

x y 

[10:75] 

374 

V 

[10:76] 

'374 

v , 

[10:77] 

375 

y . x 


V (bi serial r) 

[10:78] 

375 

[10:79] 

375 

a, /?, 7 , S, p, Q, p', Q* 

[10:80] 

380 

in 2X2-fold defined 



C xy *- 2X2-fold 



<fi, - product moment r 

[10:81] 

381 

in 2X2-fo1d 



fa 
x y 

[10:82] 

381 

X 

[10:83] 

381 

to test (p-p) 

[10:84] 

381 




f l(N-2) t0 teSt ^ X y “^xy) 

[10:85] 

381 


[ 10 : 86 ] 

382 

r t tetrachoric r 

[10:87] 

383 
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V(r t ) 

[10:88 ] 

386 

r t when dichotomic 

[10:89 ] 

386 

lines are at the 
medians 

[10: 89a] 

3 87 

V(r) 

[10:90 ] 

387 

r~ r between class 

m 

[10:91 ] 

393 


means and continuous 
vari ate 



[10:92 ] 

393 

C X 

m 

r 

[10:93 ] 

393 

m x v 
m 3 m 

[10:94 ] 

393 


[10:95 ] 

393 

c xy 
nr 7 m 

x , - a weighted sum 

[ 10: 96a] 

395 


[ 10: 96b] 

395 

V 

[10: 97 ] 

39 5 

CL 

S' 

[10:98 ] 

39 5 

cl/3 

Lx/3 

[10:99 ] 

395 

[10: 100] 

396 


[10: 10 i] 

396 

Lg ’ r ij » Lh ^erage 

[10: 100] 

396 

i ntercorrelations 

Li 

[10: 102] 

397 


[10: 105] 

397 

Loi 

[10: 103] 

397 

F variance of 

[10: 106] 

398 

a difference 
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Defining an obtained 

[11:01 ] 

401 

measure x lf a true 

[11:01a] 

402 

measure x , and an 

Cz) 

error in the obtained 
measure e 2 

[1 1:02 ] 

402 

Defining half scores x^ 

(or x x ) and 
~2 

[11:03 ] 

402 

Defining second variable 
x„. half scores x,, and 
x^ and true scores x^ 

[i 1 : o I b] 

402 

Defining a criterion 
variable x Q1 and a true 
criterion variable x^ 

[1 1:04 ] 

403 

V(X 3 -x 5 ) 

[1 1:05 ] 

403 

cr ± #w ,- Ru 1 on 1 s standard 
error of estimate 
formula 

[l 1:06 ] 

403 

r x ,- Kuder-Richardson rel i- 
abi 1 i ty coefficient 

[11:07 ] 

404 

r, = r. K 

1 ;>5 

2 

[11:08 ] 

405 

r , - Spearman-Brown 
n 1 

formula 

[1 1:09 ] 

406 


[1 l:09a] 

406 

n 

[ 1 1 :09b] 

406 

F 

n 

[1 1 :09c] 

406 

i*i Spearman-Brown 
formula 

[II: 10 ] 

407 
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[ 1 1 : 1 1] 

407 

*1 


[II: 12] 

407 

V (sum) 

[II: 13] 

407 

V 

- variance error 

of estimate 

[II: 14] 

407 

M 

CO 


[ll: 15] 

408 

V 

CO 


[l i: 16] 

408 

So 


[11:17] 

409 

r ico 


[11:18]' 

409 

X 

CO 


[II: 19] 

409 

X 

CO 


[1 1:20] 

409 

v * 


[11:21] 

410 

v »-l 

vari ance e rror of 
est i mate when X ± i s 
regressed 

[11:22] 

410 

C 0 1 

C'OXO 

[11:23] 

412 

r ool» 

- correction for attenu- 
ation in one variable 

[ 1 1 : 24] 

412 

r 

coco* 

- correction for 
attenuation 

[11:25] 

[13:85] 

412 

527 



[l 1:26] 

412 

S.1 


[11:27] 

413 
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d 12 ~ Z \ ~ Z 2 ( Z ' S 3re 

[11:28 ] 

414 

standard scores) 



V(d 12 ) 

[1 1: 28a] 

414 

c(d l? d I II) 

[11:29 ] 

414 

r ( d 12 d I 11^ 

[t 1: 29a] 

415 

d 12.a>-y 

[11:30 ] 

415 


[i 1: 30a] 

415 

V(d 12 ) 

[1 1:31 3 

415 

[ 1 1 : 3 1 a] 

415 

d 

coy 

[11:32 ] 

415 

ndj 

[1 1 :32a] 

415 

d co/co,y/y 

[ 1 1 : 33 ] 

415 

^ ( d a/ co, y/ y) 

[1 1:33a] 

416 

d Z> y 

[1 1:34 ] 

416 


[1 1 : 34a] 

416 


[II: 34b] 

417 

d 12 and d S y 

squared critical ratios 

[l 1: 35 ] 

[1 1:36 ] 

417 

417 

Rel i ab i 1 ity of d- — 

[1 1:37 ] 

419 

r ± (triad) 

[1 1:38 ] 

420 

V^) 

[11:39 ] 

420 

r l 

[1 1:40 ] 

421 
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[11:41 ] 

M-21 


[11:42 ] 

422 

{ 

12345 

[l 1:45 ] 

424 

Best estimate of z 

CO 

[1 1:44 ] 

424 

Weighting- to al low for 

[11:45 ] 

424 

rel iabil ity 

Effect of range upon 

[I I: 46a] 

427 

rel iabi 1 ity 

V /V 

[1 1:47 ] 

427 

CO, CO 

S 2 

[1 1:48 ] 

429 

V 1.2 = V 1.2 

[1 1:49 ] 

430 

b u = 

[1 1:50 ] 

430 

Effect of an imposed 

[1 1:51 ] 

430 

v 2 IV 2 upon r 12 

[l 1:52 ] 

430 

Effect of normal selective 

[1 1:53 ] 

432 

processes upon r 1? 

[ 1 1: 55 ] 

432 

X Q in a 3-variable problem 

[ 1 1: 56 ] 

433 


[1 1:56a] 

433 

7 o 

[ 11:57 ] 

434 

b 1 and h 2 

[11:58 ] 

434 

a,- the constant term 

[1 1:59 ] 

434 

^0.12’“ a 3-variable 

[11:60] 

434 

multiple al ienation 

[1 1:62 ] 

435 

coefficient 

[1 1:81 ] 

441 
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KEY PHRASE OR SYMBOL 
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PAGE 

and /? 2 ,- standard 

[ ! I: 61] 

435 

score regression 

[11:62] 

435 

coefficients 

[1 1: 82] 

44 1 


[1 1:83] 

441 

V (z Q ) , - anal ys i s of 

[1 I: S3] 

435 

variance 

r Q 

[11:64] 

435 

[11:65] 

436 


[i 1: 94] 

445 

v 

V Q *12 

[II: 56] 

436 

^ 0 , - analysis of variance 

[11:67] 

437 

d. o* f. 

[II: 6 3] 

437 

- P 2,N-3 t0 te3t ^0 Al2 ~ °) 

[1 1:69] 

437 

K , - analysis of variance 

[11:70] 

438 

when r i2 - 0 

d.o. f. 

[H:7I] 

438 

r i>N _ 3 to test (6 X -0) 

[11:72] 

438 

when r 12 = 0 

f ! N _ 3 to test (b 1 ~b 1 ) 

[Ils 73] 

436 

when r j - 0 

L * 2 

[1 1:74] 

439 


[1 1:83] 

441 


[1 1:95] 

445 

^1 ~ ^ 01.2 ~ et c. 

[1 1:75] 

439 

'°1 = h 01.2~ etC * 

[11:76] 

439 


[12:20] 

461 

*0. 2 

[11:77] 

439 
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f M-3 t0 ^ est 
^0 1.2 ~ °01. 2^ 

[I 1:78 ] 


[11:79 ] 

A,- major correlation 
determinant 

[11:80 ] 

[1 2:26 ] 

A 01 


S, - a major determinant 

[1 I: 84 ] 

r/, - a major determinant. 

[1 1:85 ] 

0, - a major determinant 

[1 1:86 ] 

Jf 0> M x , and # 2 

[1 1: 87 ] 

V,, and F 2 

[1 1:88 ] 

r 0 1- r 02 ’ and r i2 

[11:89 ] 

Regression constants 
<9^ t and h 2 

[1 1: 90 ] 
[11:91 ] 

[1 1:92 ] 

V 

r 0 *12 

[1 1:93 ] 

■^oAi i 2 c ! uac ^ r ' c regression 

[1 1:96 ] 
[13:03 ] 

*oNi 

[1 1:97 3 
[13:02 ] 

V Q analysis of variance 

[II: 101] 

d. o, f . 

[1 1: 102] 


V Q analysis of variance [l 1:104] 
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nee ^ 

[II: 107] 

448 

second degree regression 

N „. 4 , - to test need of 

[11:108] 

448 

third degree regression 

7j (j -( , - correlation ratio 

[II: 109] 

450 

Giving estimates of 

[ 1 1: 1 10] 

451 

population variance 

to 


errors for different 

[l 1: 1 13] 

452 

parabolic estimates 

giving corrections 

[l l: l 14] 

452 

to r 2 for shrinkage for 

to 


d ifferent parabol ic 

[1 1: 1 16] 

452 

regressions 

e z T - € an unbiased cor re- 

[l l: l 17] 

452 

1 at i on rat io, cf* [ 13: 10] 
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Interpolating values in a table, 
601-603 

Interpolation, 27, 538 - 543 , 
622-623 

accuracy of, -541-543* 627 
two-point, 591, 622 , 626 
three-point, 541, 622, 626 
four-point, 591, 622 - 623 , 626 
harmonic, 619, 623 
inverse, 539-543 , 622 , 625 
inverse, quadric, 543 , 626 

Interval, 122, 127, 539 
Interval, optimal, ' 

Invariance, 23, $1, 5?3 
Inverse matrix, 479-480 
Isogonal transformation, 691 

d-shaped curves, 240 

(Includes "twisted d type") 

dames, Vern, 432 
derome, Harry, 68 
dudgment of Irrelevance*, 6-9 
dudgment of sameness, 6-9 


K (Fisher) statistics, 212, 
215-216 

Kelley, Truman L . , 149, 23 1 , 

254, 289, 301, 326, 33O, 

393, 404, 418, 421, 424, 

427, 430, 458, 499, 511, 

5 88, 605, 612, 619, 623, 

6 26, 627 

Ke 1 ley-Sa li s bu r y * t e ra t i ve p ro - 
cess, 457 

Kendall, M. G. , 212, 582, 615 
Keynes, d. M. , 47 
Kondo, T., 616 
Kq ren, dohn, 10 
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Kude r, G. F. , 404 

Kurtosis, 195, 237, 242, 246- 
247 , 248 

Kurtz, A. K, , 616 

label f i ng classes, 129-140 
lag and lead, 492-497 
Lagrange multipliers, 607-608 

Lag rang lan interpolation coef- 
ficients, 538-5*42. 622-623, 
626-627 

Laplace, 216, 277 
Latin squares, 572, 620 
Lead and lag, 492-497 
Least squares, 3 7 
Lee, Al Ice, 614, 615 
Legendre, A. M., 614 

Lexis 1 ratio tLexian ratio. 
Coefficient of dlsturbancy), 
198, 696 

Like) i hood ratio, 560 
Limits, natural, 104, 195, 497 
Linear restrictions, 355, 482 
Logarithmic chart, 110 

Logarithms, 27, 3 1-33 , 39-35, 
270-271, 614-615, 621, 696 

Logistic, 33 
Long, H . L., 529 
Iowan, A. N., 621-622 

Macdonne ! I , W. R., 263 
Macfaurin series, 610 
Maher, Helen C. , 3^9 
M akeham* s mortality curve, 608 

Mantissa of a common logarithm, 

692 

Maps, 99, 100, 155, 158 
Mathematical statistics, 7-8 
Matrices, 573-577 

addition of matrices, 574 
— ■'adjoint of a square matrix, 
574, 576 

associative properties, 576 
conjugate elements, 574 


Mat ri ces 

diagonal, 576-577 
distributive properties, 576 
e ! ement s, 5 73-574 
identity matrix, 575-576 
inverse product, 576 
multiplication of, 472, 575, 
576 , 5 77 
order of, 574 

pre- and . post-mu It? pi I catio-n 
of, 575 

scalar, 574-57 7 
symmetric, 574 
transpose, 574, 575 

Matrices in connection with 
multiple correlation, 459- 
460, 475, 481 

Maximum likelihood, 246, 254 

McNemar, Quinn, 458 

Mean, 207-210, 234-235, 253 

Mean, computation of, 216-222 
Mean, properties of, 218-219 

Mean, variance error of M from 
correlated data, 357 

Mean weighted, 5°5-506 
Mean dev 1 at i on, 22 7-230 
Mean of N ranks, 366 
Median, 229, 235, 240-248 
Medical statistics, 620 
Mental messurement, 404-405 
Mercator projection, 16 4-16 5 
Merrlngton, M • , 620, 625 
Mid-parent measure, 33® 

Mil Is, d. P. , 616 
Mises, Richard von, 7 
Mode, 128-129, I3I-I32, 236, 
258-265 

Modified Doolittle solution, 
458-471 

Molina, E. C. , 583, 624 
Moments, 196, 210-227, 507 

Moments, computation of, 217- 
222 

Moments, Infinite, 51° 
Moments, negative, 510 
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Monotonic, 49, 52 
Mou I , Margaret, 616 
Moving average, 258 
Mul timoda II ty, 195? 23 7 

Multiple correlation, and re- 
gression, SEE Correlation, 
multiple 

Multivariate analysis, 198 

National (U.S.'I Bu reau o f Stan- 
dards Tables, 621-622 

Natural limits l SEE also 11 2ero 
point”), 193, 194 

Napierian logarithms, 697 
Neyman, d., 559 
Nomographs, 6 16 

Nonlinear regression, 33 l S 442- 
453 

Normal bivariate distribution, 
348 

Normal correlation, 341, 3^3, 
348 

Normal distribution, 36 , 214, 
259, 275-310, 3^7, 3^9, 

* 512, 529-531, 532-53S 538, 

564 , 613 , 615 , 616 , 621 , 
623 , 626-627, 639-656 

Normal equations, 456-457, 463 - 
464, 471-475 

Normal equ i-p robab I e and mean 
ranges, 529-53 8 

Normal skewness, 316 
Normal kurtosts, 316-317 

Normal optimal Interpercentile 
range, 231 

Normalizing a distribution, 

198, 592-593, 620 
Normalizing a variance ratio, 

■ 310 , 325-331, 599 
Norton, H. W. , 620, 624, 625 
Null hypothesis, 332 , 355-356 

Number of classes to use, 133- 
134 

Numerical solutions of compli- 
cated si mu I taneous equations, 
591-592 


Nume r i ca i so I ut I on of p&r&bo I I c 
and exponential equations, 
589-591 

Observed value, $2 
Observing eye, 88 
Ogive, 36 , 100, 141-149 

One-sided alternative test of 
the mean, 56 ? 

One-third sigma rule, 223 
Operating characteristic curve, 

557 , 559 , 560 , 568 

Optimal Interval for graphic 
portrayal, 531-538 

Ordered variable, 336 
Orthogonal, 53 
Orthogonal polynomial, 620-621 
Orthogonal transformation, 697 
Orthographic projection, 165 

p as a proportion In a normal 
distribution, 374 

p as a proportion corresponding 
to a variance ratio, 491-492 

Pairman, Eleanor, 614, 618 
Parabola, 33 

Parabolic equations of high 
degree, solution of, 589 

Parabolic regression, 33^-335, 
355, 442-453 

Parameter, 355, 43 2 
Parent population, 4-5 

Partial cor re I at f on , —S EE Mul- 
tiple correlation 
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Pearson Type II I curve, 251-252, 
264 

Pearson Type IV curve, 264 
Peirce, B. 0 ., 613 , 619 
Pentad function, 422 

Percent i le chart, 100, 141-149, 

195 

Percent! i es, 95, 100, 141-149, 

230-232 

Period, 71, 193, 482-49 2 

Periodogram ana I y si s, 198, 482- 

492 

Permutations, 693 
Perspective, 166-173 
Pictogram, 100, 151 
Pie chart, — SEE Circle chart 
Point binomial, 581-58 2 
Poisson, 564 

Poisson d? stri but ion, 316 , 570, 
582-583, 599, 614, 615 
Polynomial distribution, 609 
Population, center of, 159-163 
Population, parent, 4-5 

Population statistics, 204-227, 
451-452 

Powys, A, W, , 263 
Predictand, 3 Q 7 
Prediction equation, 15-19 
Predictor, 387, 464 
Presentation of results, 79 
Pretorius, J., 6 16 
Price index, 115 

Price ratios (SEE also Ratio), 
114, 268-269 

Primary table, 87 
Principal components, 572 
Probability, 28, 37, 78 
Probability, apriort, 3 12—3 ^3 7 
321, 324 

Probability, a posteriori, 3 22 , 
324 

Probability function, — “SEE 
Norma! di stri button 
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P r-o b a b i l i t y of j o i n't occur- 
rence, 319-320 

Probable error, 293-294 

Product moment,— *SEE Covari- 
ance, 350 

Product moments, higher, 554 
_ Proportion, 193, 194 , 196 , 198 
Proportion, moments of a, 545 

Proportions, correlation and 
regressions of, 545-548 

Pseudo-temporal series, 64 

Publication, decimal places to 
be kept in, 222-223 

Quadrati c mean, 235 

Quad ratu re, 697 

Qua! i tat i ve ‘ series, ’ 62 - 63 , 67, 
72-7 4, 150-152, 153-154, 
19 4-19 7, 333 

Qualitative-spatial series, 63 , 
67' 

Qualitative- temporal series, 

62 - 63 , 67 

Quantification of qualitative 

data, 311-313 

Quantitative series,, 63 , 67, 
72, 73, 74, 120-149 , I 54 , 

194 - 197 , 333 - 334 , 336 

Quart! le deviation, 231-232, 
294 

Quetelet, L. A. J., 277-279 
Quotients, 501-5.04 

r x 10, SEE Co r re I at ion 
Radians, 598 

Random sampling numbers, 621 
Range, SEE Vari ab i 11 ty i n range 

Rank correl ati on, 365-370, 39 6 — 
398 

Ratio, 70, 198 
Rational function, 697 
Rational integral function, 697 
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Ratio test tor convergence, 69 7 

Read, Cecil 8 ., 159 

Reduced measures, 344 

Regression, 332-395 

Regression, nonlinear, 333, 
338 , 41 + 9-452 

Regression coefficient, vari- 
ance error of, 35 ? 

Regression line, trustworthi- 
ness of a point upon, 3 ^ 3 3 ^ 5 
Rejection region, 559 

Relative time chart, 100, 104, 
10 7, 109 

Reliability, triadformula for, 
420-421 

Reliability coefficient, 403, 
404, 423-427 

Reliability of measurement, 
420-425 

Reliability step-up formula, 
406, 407, 408 

Resi dua l , 18 

Retrospective chart, 107 
Reversion, 338 , 341 
Rhi nd, A, ii . , 508, 614 
rho = p , 365-370 

rho, corrected forties ir 
rank, 368-3 70 

Richardson, M. W 404 

Rietz, Henry Lewis, 69 , 453 > 
507 , 5 13 

Robinson, G., 529, 589, 610 
Romanovsky, v.,316, 360, 58 1-582 

Rotated variables, functions 
of, 604-605, 619 

Rounding off answers, 222-223 

Rulon, Phillip J., 284-285, 
403, 405 

Saffir, Milton, 38 I, 617 
Salisbury, F. S., 457 
Salvosa, L. R. , 252, 517 

Sample, 4-7, 194-195, 33?, 355, 

473 

Sample number, 556 


Sampling, 5, 6-7, 194-195, 33?, 
355, 473 

Satakopan, V*, 620-621 
Scarborough, dames 8., 45, 5®9 

Scatter diagram, }4Q 
Scedast i city, 34 2 
Score, 53? 125-127 
Seasonal fluctuations, 70-71 
Secrist, Horace, 13 4 
Segmented bar diagram, 15 1 

Semi - i nte rquart i i e range, 2J 1- 
232 

Semi-logarithmic chart, 110- 
115 

Semi n vari ants, 211 

Sequential analysis, 555-570, 
572 

Series, st at i st i ca I , 4-5 , 6 2-8 2 
graphic, 6 2 - 63 , 66 , 6 7, 72 

qualitative, 62 - 63 , 67, 7 2, 
73 

quantitative, 63 , 67, 73-74 
spatial, 62 - 63 , 66 , 67, 72 
temporal, 62 , 63 , 6 7 , 69-70 

Shading, 155 

Shen, Eugene, 407, 420-421 
Sheppard, W. F. , 218, 6 13 , 623 

Sheppard T s corrections, 218, 
36 I, 39 2 

Shrinkage in r, 333? 450-452, 
468, 474 

Significant figures, 38-4 5 

Simultaneous equations, solu- 
tion of, 58 Q- 58 I, — 'SEE also 
Doolittle method 

Sinusoidal interrupted projec- 
tion, 166 

Skewed distribution, 134-140 

Skewness, 195, 237, 24?, 248- 
252, 269-271, 316 

Skewness, Fisher, 248-249 
Skewness, Pearson, 248-249 
Skewness, Pear son 248-249 
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Slope, 3^9 

Smith, B. Babington, 529 , 615 
Smi th, d ames G. , 4 

Smithsonian mathematical ta- 
bles, 624 

Snedeco r, George W. , 310 

Soper, H. E. , 375 

Space, two dimensions, 603-605 

Space, three or more dimensions, 
605-606 

Spatial series, 62 - 63 , 66-67, 72 

Spearman, 367 - 368 , 40 5 , 

406, 412, 696 

Spearman*s correction for at- 
tenuation, 412, 617 

Spearman's foot-rule formula 
for correlation, 367 

Spearman's rank formula for 
correlation, 367, 617 

Spearman's reliability coef- 
ficient, 405, 617 

Special purpose table, 87, 88 , 
91-93, 94 

Split test reliability, 402, 
405 

Spot maps, 198 

Square contingency, 3 02 

Square roots, 5 4-3-5 4-4 , 612, 
617 , 621 , 627, 652-656 

SRG REPORT NO. 255, 570 
Stable features, I 85 , 198 
Stabler, E. R. , 529 . 

Standard deviation, 201-202 
SEE al so Vari ance 

Standard deviation of mean, 
207-210 

Standard error, — SEE also Vari- 
ance error 

Standard error, 294 

Standard error of estimate, 
364, 403, 407, 592-593 

Standardized test, 364 - 365 

Standard scores, 216, 293, 333, 
454 

Statistic, 53-54- 


753 

Statistics of attributes, 311- 

331 

Statistical analysis, 13 , 22-24 
Statistical processes, 74-79 
Statist. Resea rch Group, 559, 570 
Statistical series, 3-5, 6 2—8 3 

Statistical series, geographic, 
62-63, 66 , 67, 72 

Statistical series, qualita-* 
tive, 62 - 63 , 67, 72, 73 

Statistical series, quantita- 
tive, 63, 67, 73-74-. 

Statistical series, spatial, 
62-63, 66 , 67, 72 

Statistical series, temporal, 
62-63, 67, 69-70 

Statistical tables, 84-97 
Stereoscopic presentation, 153, 

15 * 

Stevens, Stanley Smith, 529 
Stippling, 155 

S t T rf ing' s *app roxi mat Ion to the 
factorial, 609-610 

Stochastic series, 312 
Stoessiger, B., 6 l 6 

Stub, 86 , 87, 89-90 
"Student's" t, 284-286, 3 O 6 - 3 IO, 
320, 357, 620, 625 
Subordination, 498 
Summation method for computa- 
tion of the mean and of the 
moments, -220-222 
Summarizing stati stl cs, 57 , 73- 
74 

Systematic error, 41 

t f SEE "Student's" t 

Tables, 612-658 
Tables, expanding by interpo- 
lating, 601-603 
Tabulation of data, 76-77 

Taylor’s series, 6 10 

Temporal series, 62 - 63 , 69-70, 
101 - 120 , 193, 4-82-497 
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Terminal statistics, 79, 233 

24 8, 

Test. of variability, 576 

Tetrachorlc correlation, — SEE 
Correlation, Tetrachorlc 

Tetrad, 421-422 
Tetrad difference, 421-422 
Theile, T. N. 212, 264 
Thompson, A. J • , 615 
Thompson, Catherine M., 621, 
623, 625 

Thomson, Godfrey H., 120, 696 
Thorndike, E- L. , 272, 282-283 

Three-dimensional portrayal, 
166-172 

Thurstone, L. L. , 59, 3 8 % 617, 
696 

Time chart, 100, 103-7110 
Time limit test, 272 
Time ratios, 115 

Time reversal test of an index, 
268-2 69 

Tippett, L. H. C., 529-531? 

615, 616 

Tolley, H„ R., 458 

Transformation, 51 , 56 , 193, 
195, 198, .297-298, 340-341 

Transformation of F, 310, 3^5- 

331 

Transformations, cube root, 599 
Transformations, square root, 
598-599 

Transformations of percentage 
positions into scores, 592- 
598 

Transformations of ranks into 
equally reliable scores, 592- 
598 

Transformations of ranks into 
scores, 592-598 

Transformation to produce lin- 
ear regression, 335 
Transforming rank and percen- 
tage positions into quanti- 
tative scores, 592-598 


T ransmutat I on, 56,- 340-341 
Trend, 70, 193, 483 
Triad, 420 

Trigonometric functions, 603 , 
613, 619, 621, 624-625 

True ability statistics, 40? - 
408-424 

Trustworthiness of various 
measures, 236-238 

Truth, 197 

Two-point d I st rl but I on, 313-314 
Two-sided alternative, 567 


Uni mo dal, or ” 1 n or “A** curves, 
238, 240 

U. S., Subdi vl sions o f , 155-157 

U. S, Bu reau of Standards l Fed- 
eral Works Agency; Works Pro- 
ject Administration for the 
State of New York), 289 

U * S . Census, 16 th Abstract, 
156-157 


Vantage point, 167 
Variability, 15, 195, 199-233 

Variability in range, effect 
of, 425, 432, 616, 625 

Variables, types of, 336 
Variance, 15 , 199 

Variance, analysis of, 332, 
3^5-3"+6 354-358, 438 , 440, 

44.1, 446-453, *+55-4-56 , 468- 
469, 470-471, 475, 477-478, 
594-595 

Variance, computation of, 216- 
222 

Variance error, derivation of, 
523-529 

Variance error, derivation of 
binomial expansion method, 

523 

Variance error, derivation of 
differential method, 524 



